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Preface 


Why have we written this book? In recent decades the field of financial risk man- 
agement has developed rapidly in response to both the increasing complexity of 
financial instruments and markets and the increasing regulation of the financial ser- 
vices industry. This book is devoted specifically to quantitative modelling issues 
arising in this field. As a result of our own discussions and joint projects with indus- 
try professionals and regulators over a number of years, we felt there was a need 
for a textbook treatment of quantitative risk management (QRM) at a technical yet 
accessible level, aimed at both industry participants and students seeking an entrance 
to the area. 

We have tried to bring together a body of methodology that we consider to be core 
material for any course on the subject. This material and its mode of presentation 
represent the blending of our own views, which come from the perspectives of 
financial mathematics, insurance mathematics and statistics. We feel that a book 
combining these viewpoints fills a gap in the existing literature and emphasises the 
fact that there is a need for quantitative risk managers in banks, insurance companies 
and beyond to have broad, interdisciplinary skills. 


What is new in this second edition? The second edition of this book has been 
extensively revised and expanded to reflect the continuing development of QRM 
methodology since the 2005 first edition. This period included the 2007-9 finan- 
cial crisis, during which much of the methodology was severely tested. While we 
have added to the detail, we are encouraged that we have not had to revise the 
main messages of the first edition in the light of the crisis. In fact, many of those 
messages—the importance of extremes and extremal dependence, systematic risk 
and the model risk inherent in portfolio credit models—proved to be central issues 
in the crisis. 

Whereas the first edition had a Basel and banking emphasis, we have added more 
material relevant to Solvency II and insurance in the second edition. Moreover, the 
methodological chapters now start at the natural starting point: namely, a discussion 
of the balance sheets and business models of a bank and an insurer. 

This edition contains an extended treatment of credit risk in four chapters, includ- 
ing new material on portfolio credit derivatives and counterparty credit risk. There 
is a new market-risk chapter, bringing together more detail on mapping portfolios 
to market-risk factors and applying and backtesting statistical methods. We have 
also extended the treatment of the fundamental topics of risk measures and risk 
aggregation. 

We have revised the structure of the book to facilitate teaching. The chapters are 
a little shorter than in the first edition, with more advanced or specialized material 
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now placed in a series of “Special Topics” chapters at the end. The book is split into 
four parts: (I) An Introduction to Quantitative Risk Management, (II) Methodology, 
(II) Applications, (IV) Special Topics. 


Who was this book written for? This book is primarily a textbook for courses 
on QRM aimed at advanced undergraduate or graduate students and professionals 
from the financial industry. A knowledge of probability and statistics at least at the 
level of a first university course in a quantitative discipline and familiarity with 
undergraduate calculus and linear algebra are fundamental prerequisites. Though 
not absolutely necessary, some prior exposure to finance, economics or insurance 
will be beneficial for a better understanding of some sections. 

The book has a secondary function as a reference text for risk professionals 
interested in a clear and concise treatment of concepts and techniques that are used 
in practice. As such, we hope it will facilitate communication between regulators, 
end-users and academics. 

A third audience for the book is the community of researchers that work in the 
area. Most chapters take the reader to the frontier of current, practically relevant 
research and contain extensive, annotated references that guide the reader through 
the vast literature. 


Ways to use this book. The material in this book has been tested on many different 
audiences, including undergraduate and postgraduate students at ETH Zurich, the 
Universities of Zurich and Leipzig, Heriot-Watt University, the London School of 
Economics and the Vienna University of Economics and Business. It has also been 
used for professional training courses aimed at risk managers, actuaries, consultants 
and regulators. Based on this experience we can suggest a number of ways of using 
the book. 

A taught course would generally combine material from Parts I, II and HI, although 
the exact choice of material from Parts II and III would depend on the emphasis of 
the course. Chapters 2 and 3 from Part I would generally be core taught modules, 
whereas Chapter | might be prescribed as background reading material. 

A general course on QRM could be based on a complete treatment of Parts I- 
HI. This would require a minimum of two semesters, with 3—4 hours of taught 
courses per week for an introductory course and longer for a detailed treatment. 
A quantitative course on enterprise risk management for actuaries would follow a 
very similar selection, probably omitting material from Chapters 11 and 12, which 
contain Basel-specific details of portfolio credit risk modelling and an introduction 
to portfolio credit derivatives. 

For a course on credit risk modelling, there is a lot of material to choose from. 
A comprehensive course spanning two semesters would include Part I (probably 
omitting Chapter 3), Chapters 6 and 7 from Part II, and Chapters 10-12 from Part II. 
Material on counterparty credit risk (Chapter 17) might also be included from Part IV. 

A one-semester, specialized course on market risk could be based on Part I, 
Chapters 4—6 from Part II, and Chapter 9 from Part III. An introduction to risk 
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management for financial econometricians could follow a similar selection but might 
cover all the chapters in Part II. 

It is also possible to devise more specialized courses, such as a course on risk- 
measurement and aggregation concepts based on Chapters 2, 7 and 8. Moreover, 
material from various chapters could be used as interesting examples to enliven 
statistics courses on subjects like multivariate analysis, time-series analysis and 
generalized linear modelling. In Part IV there are a number of potential topics for 
seminars at postgraduate and PhD level. 


What we have not covered. We have not been able to address all the topics that a 
reader might expect to find under the heading of QRM. Perhaps the most obvious 
omission is the lack of a section on the risk management of derivatives by hedging. 
Here we felt that the relevant techniques, and the financial mathematics required 
to understand them, are already well covered in a number of excellent textbooks. 
Other omissions include modelling techniques for price liquidity risk and models 
for systemic risk in national and global networks of financial firms, both of which 
have been areas of research since the 2007-9 crisis. Besides these larger areas, 
many smaller issues have been neglected for reasons of space but are mentioned 
with suggestions for further reading in the “Notes and Comments” sections, which 
should be considered as integral parts of the text. 
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1 


Risk in Perspective 


In this chapter we provide a non-mathematical discussion of various issues that form 
the background to the rest of the book. In Section 1.1 we begin with the nature of 
risk itself and discuss how risk relates to randomness; in the financial context (which 
includes insurance) we summarize the main kinds of risks encountered and explain 
what it means to measure and manage such risks. 

A brief history of financial risk management and the development of financial 
regulation is given in Section 1.2, while Section 1.3 contains a summary of the 
regulatory framework in the financial and insurance industries. 

In Section 1.4 we take a step back and attempt to address the fundamental question 
of why we might want to measure and manage risk at all. Finally, in Section 1.5 we 
turn to quantitative risk management (QRM) explicitly and set out our own views 
concerning the nature of this discipline and the challenge it poses. This section in 
particular should give more insight into our choice of methodological topics in the 
rest of the book. 


1.1 Risk 


The Concise Oxford English Dictionary defines risk as “hazard, a chance of bad 
consequences, loss or exposure to mischance”. In a discussion with students taking 
a course on financial risk management, ingredients that are typically discussed are 
events, decisions, consequences and uncertainty. It is mostly only the downside of 
risk that is mentioned, rarely a possible upside, i.e. the potential for a gain. While 
for many people risk has largely negative connotations, it may also represent an 
opportunity. Much of the financial industry would not exist were it not for the 
presence of financial risk and the opportunities afforded to companies that are able 
to create products and services that offer more financial certainty to their clients. 

For financial risks no single one-sentence definition of risk is entirely satisfactory. 
Depending on context, one might arrive at notions such as “any event or action that 
may adversely affect an organization’s ability to achieve its objectives and execute its 
strategies” or, alternatively, “the quantifiable likelihood of loss or less-than-expected 
returns”. 


1.1.1 Risk and Randomness 


Regardless of context, risk strongly relates to uncertainty, and hence to the notion of 
randomness. Randomness has eluded a clear, workable definition for many centuries; 
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it was not until 1933 that the Russian mathematician A. N. Kolmogorov gave an 
axiomatic definition of randomness and probability (see Kolmogorov 1933). This 
definition and its accompanying theory provide the language for the majority of the 
literature on risk, including this book. 

Our reliance on probability may seem unsatisfactorily narrow to some. It bypasses 
several of the current debates on risk and uncertainty (Frank Knight), the writings on 
probabilistic thinking within economics (John Maynard Keynes), the unpredictabil- 
ity of unprecedented financial shocks, often referred to as Black Swans (Nassim 
Taleb), or even the more political expression of the known, the unknown and the 
unknowable (Donald Rumsfeld); see the Notes and Comments section for more 
explanation. Although these debates are interesting and important, at some point 
clear definitions and arguments are called for and this is where mathematics as a lan- 
guage enters. The formalism of Kolmogorov, while not the only possible approach, 
is a tried-and-tested framework for mathematical reasoning about risk. 

In Kolmogorov’s language a probabilistic model is described by a triplet 
(2, F, P). An element w of 2 represents a realization of an experiment, in eco- 
nomics often referred to as a state of nature. The statement “the probability that 
an event A occurs” is denoted (and in Kolmogorov’s axiomatic system defined) 
as P(A), where A is an element of F, the set of all events. P denotes the prob- 
ability measure. For the less mathematically trained reader it suffices to accept 
that Kolmogorov’s system translates our intuition about randomness into a concise, 
axiomatic language and clear rules. 

Consider the following examples: an investor who holds stock in a particular 
company; an insurance company that has sold an insurance policy; an individual 
who decides to convert a fixed-rate mortgage into a variable one. All of these sit- 
uations have something important in common: the investor holds today an asset 
with an uncertain future value. This is very clear in the case of the stock. For the 
insurance company, the policy sold may or may not be triggered by the underly- 
ing event covered. In the case of a mortgage, our decision today to enter into this 
refinancing agreement will change (for better or for worse) the future repayments. 
So randomness plays a crucial role in the valuation of current products held by the 
investor, the insurance company and the home owner. 

To model these situations a mathematician would now define the value of a risky 
position X to be a function on the probability space (2, F , P); this function is called 
a random variable. We leave for the moment the range of X (i.e. its possible values) 
unspecified. Most of the modelling of a risky position X concerns its distribution 
function Fy (x) = P(X < x): the probability that by the end of the period under 
consideration the value of the risk X is less than or equal to a given number x. 
Several risky positions would then be denoted by a random vector (X1, ..., Xa), 
also written in bold face as X; time can be introduced, leading to the notion of 
random (or so-called stochastic) processes, usually written (X,). Throughout this 
book we will encounter many such processes, which serve as essential building 
blocks in the mathematical description of risk. 
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We therefore expect the reader to be at ease with basic notation, terminology and 
results from elementary probability and statistics, the branch of mathematics dealing 
with stochastic models and their application to the real world. The word “stochastic” 
is derived from the Greek “‘stochazesthai’, the art of guessing, or “stochastikos”, 
meaning skilled at aiming (“stochos” being a target). In discussing stochastic meth- 
ods for risk management we hope to emphasize the skill aspect rather than the 
guesswork. 


1.1.2 Financial Risk 


In this book we discuss risk in the context of finance and insurance (although many 
of the tools introduced are applicable well beyond this context). We start by giving 
a brief overview of the main risk types encountered in the financial industry. 

The best-known type of risk is probably market risk: the risk of a change in 
the value of a financial position or portfolio due to changes in the value of the 
underlying components on which that portfolio depends, such as stock and bond 
prices, exchange rates, commodity prices, etc. The next important category is credit 
risk: the risk of not receiving promised repayments on outstanding investments such 
as loans and bonds, because of the “default” of the borrower. A further risk category 
is operational risk: the risk of losses resulting from inadequate or failed internal 
processes, people and systems, or from external events. 

The three risk categories of market, credit and operational risk are the main ones 
we study in this book, but they do not form an exhaustive list of the full range 
of possible risks affecting a financial institution, nor are their boundaries always 
clearly defined. For example, when a corporate bond falls in value this is market 
risk, but the fall in value is often associated with a deterioration in the credit quality 
of the issuer, which is related to credit risk. The ideal way forward for a successful 
handling of financial risk is a holistic approach, i.e. an integrated approach taking 
all types of risk and their interactions into account. 

Other important notions of risk are model risk and liquidity risk. The former is 
the risk associated with using a misspecified (inappropriate) model for measuring 
risk. Think, for instance, of using the Black-Scholes model for pricing an exotic 
option in circumstances where the basic Black-Scholes model assumptions on the 
underlying securities (such as the assumption of normally distributed returns) are 
violated. It may be argued that model risk is always present to some degree. 

When we talk about liquidity risk we are generally referring to price or market 
liquidity risk, which can be broadly defined as the risk stemming from the lack 
of marketability of an investment that cannot be bought or sold quickly enough to 
prevent or minimize a loss. Liquidity can be thought of as “oxygen for a healthy 
market’; a market requires it to function properly but most of the time we are not 
aware of its presence. Its absence, however, is recognized immediately, with often 
disastrous consequences. 

In banking, there is also the concept of funding liquidity risk, which refers to 
the ease with which institutions can raise funding to make payments and meet 
withdrawals as they arise. The management of funding liquidity risk tends to be 
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a specialist activity of bank treasuries (see, for example, Choudhry 2012) rather 
than trading-desk risk managers and is not a subject of this book. However, funding 
liquidity and market liquidity can interact profoundly in periods of financial stress. 
Firms that have problems obtaining funding may sell assets in fire sales to raise cash, 
and this in turn can contribute to market illiquidity, depressing prices, distorting the 
valuation of assets on balance sheets and, in turn, making funding even more difficult 
to obtain; this phenomenon has been described as a liquidity spiral (Brunnermeier 
and Pedersen 2009). 

In insurance, a further risk category is underwriting risk: the risk inherent in 
insurance policies sold. Examples of risk factors that play a role here are changing 
patterns of natural catastrophes, changes in demographic tables underlying (long- 
dated) life products, political or legal interventions, or customer behaviour (such as 
lapsation). 


1.1.3 Measurement and Management 


Much of this book is concerned with techniques for the statistical measurement of 
risk, an activity which is part of the process of managing risk, as we attempt to 
clarify in this section. 


Risk measurement. Suppose we hold a portfolio consisting of d underlying invest- 
ments with respective weights w1, ..., wq, so that the change in value of the portfolio 
over a given holding period (the so-called profit and loss, or P&L) can be written as 
X= S 1 Wi Xi, where X; denotes the change in value of the ith investment. Mea- 
suring the risk of this portfolio essentially consists of determining its distribution 
function Fx(x) = P(X < x), or functionals describing this distribution function 
such as its mean, variance or 99th percentile. 

In order to achieve this, we need a properly calibrated joint model for the under- 
lying random vector of investments (X1, ..., Xa), so statistical methodology has 
an important role to play in risk measurement; based on historical observations and 
given a specific model, a statistical estimate of the distribution of the change in 
value of a position, or one of its functionals, is calculated. In Chapter 2 we develop 
a detailed framework framework for risk measurement. As we shall see—and this 
is indeed a main theme throughout the book—this is by no means an easy task with 
a unique solution. 

It should be clear from the outset that good risk measurement is essential. Increas- 
ingly, the clients of financial institutions demand objective and detailed information 
on the products that they buy, and firms can face legal action when this information 
is found wanting. For any product sold, a proper quantification of the underlying 
risks needs to be explicitly made, allowing the client to decide whether or not the 
product on offer corresponds to his or her risk appetite; the 2007-9 crisis saw numer- 
ous violations of this basic principle. For more discussion of the importance of the 
quantitative approach to risk, see Section 1.5. 


Risk management. In a very general answer to the question of what risk manage- 
ment is about, Kloman (1990) writes: 
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To many analysts, politicians, and academics it is the management of 
environmental and nuclear risks, those technology-generated macro- 
risks that appear to threaten our existence. To bankers and financial 
officers it is the sophisticated use of such techniques as currency hedging 
and interest-rate swaps. To insurance buyers or sellers it is coordination 
of insurable risks and the reduction of insurance costs. To hospital 
administrators it may mean “quality assurance”. To safety professionals 
it is reducing accidents and injuries. In summary, risk management is 
a discipline for living with the possibility that future events may cause 
adverse effects. 


The last phrase in particular (the emphasis is ours) captures the general essence of 
risk management: it is about ensuring resilience to future events. For a financial 
institution one can perhaps go further. A financial firm’s attitude to risk is not pas- 
sive and defensive; a bank or insurer actively and willingly takes on risk, because it 
seeks a return and this does not come without risk. Indeed, risk management can be 
seen as the core competence of an insurance company or a bank. By using its exper- 
tise, market position and capital structure, a financial institution can manage risks 
by repackaging or bundling them and transferring them to markets in customized 
ways. 

The management of risk at financial institutions involves a range of tasks. To 
begin with, an enterprise needs to determine the capital it should hold to absorb 
losses, both for regulatory and economic capital purposes. It also needs to manage 
the risk on its books. This involves ensuring that portfolios are well diversified and 
optimizing portfolios according to risk—return considerations. The risk profile of 
the portfolio can be altered by hedging exposures to certain risks, such as interest- 
rate or foreign-exchange risk, using derivatives. Alternatively, some risks can be 
repackaged and sold to investors in a process known as securitization; this has 
been applied to both insurance risks (weather derivatives and longevity derivatives) 
and credit risks (mortgage-backed securities, collateralized debt obligations). Firms 
that use derivatives need to manage their derivatives books, which involves the 
tasks of pricing, hedging and managing collateral for such trades. Finally, financial 
institutions need to manage their counterparty credit risk exposures to important 
trading partners; these arise from bilateral, over-the-counter derivatives trades, but 
they are also present, for example, in reinsurance treaties. 

We also note that the discipline of risk management is very much the core com- 
petence of an actuary. Indeed, the Institute and Faculty of Actuaries has used the 
following definition of the actuarial profession: 


Actuaries are respected professionals whose innovative approach to 
making business successful is matched by a responsibility to the public 
interest. Actuaries identify solutions to financial problems. They man- 
age assets and liabilities by analysing past events, assessing the present 
risk involved and modelling what could happen in the future. 
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Actuarial organizations around the world have collaborated to create the Chartered 
Enterprise Risk Actuary qualification to show their commitment to establishing best 
practice in risk management. 


1.2 A Brief History of Risk Management 


In this section we treat the historical development of risk management by sketching 
some of the innovations and some of the events that have shaped modern risk man- 
agement for the financial industry. We also describe the more recent development 
of regulation in the industry, which has, to some extent, been a process of reaction 
to a series of incidents and crises. 


1.2.1 From Babylon to Wall Street 


Although risk management has been described as “one of the most important inno- 
vations of the 20th century” by Steinherr (1998), and most of the story we tell is 
relatively modern, some concepts that are used in modern risk management, and in 
derivatives in particular, have been around for longer. In our selective account we 
stress the example of financial derivatives as these have played a role in many of 
the events that have shaped modern regulation and increased the complexity of the 
risk-management challenge. 


The ancient world to the twentieth century. A derivative is a financial instrument 
derived from an underlying asset, such as an option, future or swap. For example, 
a European call option with strike K and maturity T gives the holder the right, 
but not the obligation, to obtain from the seller at maturity the underlying security 
for a price K; a European put option gives the holder the right to dispose of the 
underlying at a price K. 

Dunbar (2000) interprets a passage in the Code of Hammurabi from Babylon 
of 1800 BC as being early evidence of the use of the option concept to provide 
financial cover in the event of crop failure. A very explicit mention of options 
appears in Amsterdam towards the end of the seventeenth century and is beautifully 
narrated by Joseph de la Vega in his 1688 Confusión de Confusiones, a discussion 
between a lawyer, a trader and a philosopher observing the activity on the Beurs 
of Amsterdam. Their discussion contains what we now recognize as European call 
and put options and a description of their use for investment as well as for risk 
management—it even includes the notion of short selling. In an excellent recent 
translation (de la Vega 1996) we read: 


If I may explain “opsies” [further, I would say that] through the payment 
of the premiums, one hands over values in order to safeguard one’s stock 
or to obtain a profit. One uses them as sails for a happy voyage during 
a beneficent conjuncture and as an anchor of security in a storm. 


After this, de la Vega continues with some explicit examples that would not be out 
of place in any modern finance course on the topic. 

Financial derivatives in general, and options in particular, are not so new. More- 
over, they appear here as instruments to manage risk, “anchors of security in a 
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storm”, rather than as dangerous instruments of speculation, the “wild beasts of 
finance” (Steinherr 1998), that many believe them to be. 


Academic innovation in the twentieth century. While the use of risk-management 
ideas such as derivatives can be traced further back, it was not until the late twentieth 
century that a theory of valuation for derivatives was developed. This can be seen 
as perhaps the most important milestone in an age of academic developments in the 
general area of quantifying and managing financial risk. 

Before the 1950s, the desirability of an investment was mainly equated to its 
return. In his groundbreaking publication of 1952, Harry Markowitz laid the founda- 
tion of the theory of portfolio selection by mapping the desirability of an investment 
onto a risk—return diagram, where risk was measured using standard deviation (see 
Markowitz 1952, 1959). Through the notion of an efficient frontier the portfolio 
manager could optimize the return for a given risk level. The following decades 
saw explosive growth in risk-management methodology, including such ideas as 
the Sharpe ratio, the Capital Asset Pricing Model (CAPM) and Arbitrage Pricing 
Theory (APT). Numerous extensions and refinements that are now taught in any 
MBA course on finance followed. 

The famous Black—Scholes—Merton formula for the price of a European call 
option appeared in 1973 (see Black and Scholes 1973). The importance of this 
formula was underscored in 1997 when the Bank of Sweden Prize in Economic 
Sciences in Memory of Alfred Nobel was awarded to Robert Merton and Myron 
Scholes (Fischer Black had died some years earlier) “for a new method to determine 
the value of derivatives”. 

In the final two decades of the century the mathematical finance literature devel- 
oped rapidly, and many ideas found their way into practice. Notable contributions 
include the pioneering papers by Harrison and Kreps (1979) and Harrison and Pliska 
(1981) clarifying the links between no-arbitrage pricing and martingale theory. A 
further example is the work on the term structure of interest rates by Heath, Jarrow 
and Morton (1992). These and other papers elaborated the mathematical founda- 
tions of financial mathematics. Textbooks on stochastic integration and Ité calculus 
became part of the so-called quant’s essential reading and were, for a while, as likely 
to be seen in the hands of a young investment banker as the Financial Times. 


Growth of markets in the twentieth century. The methodology developed for the 
rational pricing and hedging of financial derivatives changed finance. The “wizards 
of Wall Street” (i.e. the mathematical specialists conversant in the new methodology) 
have had a significant impact on the development of financial markets over the last 
few decades. Not only did the new option-pricing formula work, it transformed 
the market. When the Chicago Options Exchange first opened in 1973, fewer than 
a thousand options were traded on the first day. By 1995, over a million options were 
changing hands each day, with current nominal values outstanding in the derivatives 
markets in the tens of trillions. So great was the role played by the Black—Scholes— 
Merton formula in the growth of the new options market that, when the American 
stock market crashed in 1987, the influential business magazine Forbes attributed 


10 1. Risk in Perspective 


the blame squarely to that one formula. Scholes himself has said that it was not so 
much the formula that was to blame, but rather that market traders had not become 
sufficiently sophisticated in using it. 

Along with academic innovation, developments in information technology (IT) 
also helped lay the foundations for an explosive growth in the volume of new 
risk-management and investment products. This development was further aided by 
worldwide deregulation in the 1980s. Important additional factors contributing to an 
increased demand for risk-management skills and products were the oil crises of the 
1970s and the 1970 abolition of the Bretton Woods system of fixed exchange rates. 
Both energy prices and foreign exchange risk became highly volatile risk factors and 
customers required products to hedge them. The 1933 Glass—Steagall Act—passed 
in the US in the aftermath of the 1929 Depression to prohibit commercial banks from 
underwriting insurance and most kinds of securities—indirectly paved the way for 
the emergence of investment banks, hungry for new business. Glass—Steagall was 
replaced in 1999 by the Financial Services Act, which repealed many of the former’s 
key provisions, although the 2010 Dodd—Frank Act, passed in the aftermath of the 
2007-9 financial crisis, appears to mark an end to the trend of deregulation. 


Disasters of the 1990s. In January 1992 the president of the New York Federal 
Reserve, E. Gerald Corrigan, speaking at the Annual Mid-Winter Meeting of the 
New York State Bankers Association, said: 


You had all better take a very, very hard look at off-balance-sheet activ- 
ities. The growth and complexity of [these] activities and the nature of 
the credit settlement risk they entail should give us cause for concern.... 
I hope this sounds like a warning, because it is. Off-balance-sheet activ- 
ities [i.e. derivatives] have a role, but they must be managed and con- 
trolled carefully and they must be understood by top management as 
well as by traders and rocket scientists. 


Corrigan was referring to the growing volume of derivatives in banks’ trading books 
and the fact that, in many cases, these did not appear as assets or liabilities on the 
balance sheet. His words proved prescient. 

On 26 February 1995 Barings Bank was forced into administration. A loss of 
£700 million ruined the oldest merchant banking group in the UK (established in 
1761). Besides numerous operational errors (violating every qualitative guideline in 
the risk-management handbook), the final straw leading to the downfall of Barings 
was a so-called straddle position on the Nikkei held by the bank’s Singapore-based 
trader Nick Leeson. A straddle is a short position in a call and a put with the same 
strike—such a position allows for a gain if the underlying (in this case the Nikkei 
index) does not move too far up or down. There is, however, considerable loss 
potential if the index moves down (or up) by a large amount, and this is precisely 
what happened when the Kobe earthquake occurred. 

Three years later, Long-Term Capital Management (LTCM) became another 
prominent casualty of losses due to derivatives trading when it required a $3.5 bil- 
lion payout to prevent collapse, a case made all the more piquant by the fact that 
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Myron Scholes and Robert Merton were principals at the hedge fund. Referring to 
the Black-Scholes formula, an article in the Observer newspaper asked: “Is this 
really the key to future wealth? Win big, lose bigger.” 

There were other important cases in this era, leading to a widespread discussion of 
the need for increased regulation, including Metallgesellschaft in 1993 (speculation 
on oil prices using derivatives) and Orange County in 1994 (speculation on interest 
rates using derivatives). 

In the life insurance industry, Equitable Life, the world’s oldest mutual insurer, 
provided a case study of what can happen when the liabilities arising from insurance 
products with embedded options are not properly hedged. Prior to 1988, Equitable 
Life had sold pension products that offered the option of a guaranteed annuity rate 
at maturity of the policy. The guarantee rate of 7% had been set in the 1970s when 
inflation and annuity rates were high, but in 1993 the current annuity rate fell below 
the guarantee rate and policyholders exercised their options. Equitable Life had not 
been hedging the option and it quickly became evident that they were faced with 
an enormous increase in their liabilities; the Penrose Report (finally published in 
March 2004) concluded that Equitable Life was underfunded by around £4.5 billion 
by 2001. It was the policyholders who suffered when the company reneged on 
their pension promises, although many of the company’s actions were later ruled 
unlawful and some compensation from the public purse was agreed. However, this 
case provides a good illustration of the need to regulate the capital adequacy of 
insurers to protect policyholders. 


The turn of the century. The end of the twentieth century proved to be a pivotal 
moment for the financial system worldwide. From a value of around 1000 in 1996, 
the Nasdaq index quintupled to a maximum value of 5408.62 on 10 March 2000 
(which remains unsurpassed as this book goes to press). The era 1996-2000 is now 
known as the dot-com bubble because many of the firms that contributed to the rise 
in the Nasdaq belonged to the new internet sector. 

In a speech before the American Enterprise Institute on 5 December 1996, Alan 
Greenspan, chairman of the Federal Reserve from 1987 to 2006, said, “But how 
do we know when irrational exuberance has unduly escalated assets, which then 
become subject to prolonged contractions as they have in Japan over the past 
decade?” The term irrational exuberance seemed to perfectly describe the times. The 
Dow Jones Industrial Average was also on a historic climb, breaking through the 
10 000 barrier on 29 March 1999, and prompting books with titles like Dow 40 000: 
Strategies for Profiting from the Greatest Bull Market in History. It took four years 
for the bubble to burst, but from its March 2000 maximum the Nasdaq plummeted 
to half of its value within a year and tested the 1000 barrier in late 2002. Equity 
indices fell worldwide, although markets recovered and began to surge ahead again 
from 2004. 

The dot-com bubble was in many respects a conventional asset bubble, but it was 
also during this period that the seeds of the next financial crisis were being sown. 
Financial engineers had discovered the magic of securitization: the bundling and 
repackaging of many risks into securities with defined risk profiles that could be 
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sold to potential investors. While the idea of transferring so-called tranches of a pool 
of risks to other risk bearers was well known to the insurance world, it was now 
being applied on a massive scale to credit-risky assets, such as mortgages, bonds, 
credit card debt and even student loans (see Section 12.1.1 for a description of the 
tranching concept). 

In the US, the subprime lending boom to borrowers with low credit ratings fuelled 
the supply of assets to securitize and a market was created in mortgage-backed 
securities (MBSs). These in turn belonged to the larger pool of assets that were 
available to be transformed into collateralized debt obligations (CDOs). The banks 
originating these credit derivative products had found a profitable business turning 
poor credit risks into securities. The volume of credit derivatives ballooned over a 
very short period; the CDO market accounted for almost $3 trillion in nominal terms 
by 2008 but this was dwarfed by the nominal value of the credit default swap (CDS) 
market, which stood at about $30 trillion. 

Credit default swaps, another variety of credit derivative, were originally used 
as instruments for hedging large corporate bond exposures, but they were now 
increasingly being used by investors to speculate on the changing credit outlook 
of companies by adopting so-called naked positions (see Section 10.1.4 for more 
explanation). Although the actual economic value of CDS and CDO markets was 
actually smaller (when the netting of cash flows is considered), these are still huge 
figures when compared with world gross domestic product (GDP), which was of 
the order of $60 trillion at that time. 

The consensus was that all this activity was a good thing. Consider the follow- 
ing remarks made by the then chairman of the Federal Reserve, Alan Greenspan, 
before the Council on Foreign Relations in Washington DC on 19 November 
2002 (Greenspan 2002): 


More recently, instruments ... such as credit default swaps, collateral- 
ized debt obligations and credit-linked notes have been developed and 
their use has grown rapidly in recent years. The result? Improved credit 
risk management together with more and better risk-management tools 
appear to have significantly reduced loan concentrations in telecommu- 
nications and, indeed, other areas and the associated stress on banks and 
other financial institutions.... It is noteworthy that payouts in the still 
relatively small but rapidly growing market in credit derivatives have 
been proceeding smoothly for the most part. Obviously this market is 
still too new to have been tested in a widespread down-cycle for credit, 
but, to date, it appears to have functioned well. 


As late as April 2006 the International Monetary Fund (IMF) wrote in its Global 
Financial Stability Report that: 


There is a growing recognition that the dispersion of credit risk by 
banks to a broader and more diverse group of investors, rather than 
warehousing such risks on their balance sheets, has helped to make the 
banking and overall financial system more resilient.... The improved 
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resilience may be seen in fewer bank failures and more consistent credit 
provision. Consequently, the commercial banks, a core system of the 
financial system, may be less vulnerable today to credit or economic 
shocks. 


It has to be said that the same IMF report also warned about possible vulnerabilities, 
and the potential for market disruption, if these credit instruments were not fully 
understood. 

One of the problems was that not all of the risk from CDOs was being dispersed 
to outside investors as the IMF envisaged. As reported in Acharya et al. (2009), large 
banks were holding on to a lot of it themselves: 


These large, complex financial institutions ignored their own busi- 
ness models of securitization and chose not to transfer credit risk 
to other investors. Instead they employed securitization to manufac- 
ture and retain tail risk that was systemic in nature and inadequately 
capitalized. ... Starting in 2006, the CDO group at UBS noticed that their 
risk-management systems treated AAA securities as essentially risk- 
free even though they yielded a premium (the proverbial free lunch). 
So they decided to hold onto them rather than sell them! After holding 
less than $5 billion of them in 02/06, the CDO desk was warehousing a 
staggering $50 billion in 09/07.... Similarly, by late summer of 2007, 
Citigroup had accumulated over $55 billion of AAA-rated CDOs. 


On the eve of the crisis many in the financial industry seemed unconcerned. AIG, 
the US insurance giant, had become heavily involved in underwriting MBS and CDO 
risk by selling CDS protection through its AIG Financial Products arm. In August 
2007 the chief executive officer of AIG Financial Products is quoted as saying: 


It is hard for us, without being flippant, to even see a scenario within 
any kind of realm of reason that would see us losing one dollar in any 
of these transactions. 


The financial crisis of 2007-9. After a peak in early 2006, US house prices began 
to decline in 2006 and 2007. Subprime mortgage holders, experiencing difficulties 
in refinancing their loans at higher interest rates, defaulted on their payments in 
increasing numbers. Starting in late 2007 this led to a rapid reassessment of the 
riskiness of securitizations and to losses in the value of CDO securities. Banks were 
forced into a series of dramatic write-downs of the value of these assets on their 
balance sheets, and the severity of the impending crisis became apparent. 

Reflecting on the crisis in his article “It doesn’t take Nostradamus” in the 2008 
issue of Economists’ Voice, Nobel laureate Joseph E. Stiglitz recalled the views he 
expressed in 1992 on securitization and the housing market: 


The question is, has the growth of securitization been a result of more 
efficient transaction technologies or an unfounded reduction in concern 
about the importance of screening loan applicants? It is perhaps too 
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early to tell, but we should at least entertain the possibility that it is the 
latter rather than the former. 


He also wrote: 


At the very least, the banks have demonstrated ignorance of two very 
basic aspects of risk: (a) the importance of correlation... [and] (b) the 
possibility of price declines. 


These “basic aspects of risk”, which would appear to belong in a Banking 101 
class, plunged the world’s economy into its most serious crisis since the late 1920s. 
Salient events included the demise of such illustrious names as Bear Stearns (which 
collapsed and was sold to JPMorgan Chase in March 2008) and Lehman Brothers 
(which filed for Chapter 11 bankruptcy on 15 September 2008). The latter event in 
particular led to worldwide panic. As markets tumbled and liquidity vanished it was 
clear that many banks were on the point of collapse. Governments had to bail them 
out by injecting capital or by acquiring their distressed assets in arrangements such 
as the US Troubled Asset Relief Program. 

AIG, which had effectively been insuring the default risk in securitized products 
by selling CDS protection, got into difficulty when many of the underlying securities 
defaulted; the company that could not foresee itself “losing one dollar in any of these 
transactions” required an emergency loan facility of $85 billion from the Federal 
Reserve Bank of New York on 16 September 2008. In the view of George Soros 
(2009), CDSs were “instruments of destruction” that should be outlawed: 


Some derivatives ought not to be allowed to be traded at all. I have in 
mind credit default swaps. The more I’ve heard about them, the more 
I’ve realised they’re truly toxic. 


Much has been written about these events, and this chapter’s Notes and Comments 
section contains a number of references. One strand of the commentary that is 
relevant for this book is the apportioning of a part of the blame to mathematicians 
(or financial engineers); the failure of valuation models for complex securitized 
products made them an easy target. Perhaps the most publicized attack came in a 
blog by Felix Salmon (Wired Magazine, 23 February 2009) under the telling title 
“Recipe for disaster: the formula that killed Wall Street”. The formula in question 
was the Gauss copula, and its application to credit risk was attributed to David Li. 
Inspired by what he had learned on an actuarial degree, Li proposed that a tool for 
modelling dependent lifetimes in life insurance could be used to model correlated 
default times in bond portfolios, thus providing a framework for the valuation and 
risk management of CDOs, as we describe in Chapter 12. 

While an obscure formula with a strange name was a gift for bloggers and news- 
paper headline writers, even serious regulators joined in the chorus of criticism of 
mathematics. The Turner Review of the global banking crisis (Lord Turner 2009) 
has a section entitled “Misplaced reliance on sophisticated mathematics” (see Sec- 
tion 1.3.3 for more on this theme). But this reliance on mathematics was only one 
factor in the crisis, and certainly not the most important. Mathematicians had also 
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warned well beforehand that the world of securitization was being built on shaky 
model foundations that were difficult to calibrate (see, for example, Frey, McNeil 
and Nyfeler 2001). It was also abundantly clear that political shortsightedness, the 
greed of market participants and the slow reaction of regulators had all contributed 
in very large measure to the scale of the eventual calamity. 


Recent developments and concerns. New threats to the financial system emerge 
all the time. The financial crisis of 2007-9 led to recession and sovereign debt 
crises. After the wave of bank bailouts, concerns about the solvency of banks were 
transformed into concerns about the abilities of countries to service their own debts. 
For a while doubts were cast on the viability of the eurozone, as it seemed that 
countries might elect to, or be forced to, exit the single currency. 

On the more technical side, the world of high-frequency trading has raised con- 
cerns among regulators, triggered by such events as the Flash Crash of 6 May 2010. 
In this episode, due to “computer trading gone wild”, the Dow Jones lost around 1000 
points in a couple of minutes, only to be rapidly corrected. High-frequency trading is 
a form of algorithmic trading in which trades are executed by computers according 
to algorithms in fractions of a second. One notable casualty of algorithmic trading 
was Knight Capital, which lost $460 million due to trading errors on 1 August 2012. 
Going forward, it is clear that vigilance is required concerning the risks arising from 
the deployment of new technologies and their systemic implications. 

Indeed, systemic risk is an ongoing concern to which we have been sensitized by 
the financial crisis. This is the risk of the collapse of the entire financial system due to 
the propagation of financial stress through a network of participants. When Lehman 
Brothers failed there was a moment when it seemed possible that there could be 
a catastrophic cascade of defaults of banks and other firms. The interbank lending 
market had become dysfunctional, asset prices had plummeted and the market for 
any form of debt was highly illiquid. Moreover, the complex chains of relationships 
in the CDS markets, in which the same credit-risky assets were referenced in a large 
volume of bilateral payment agreements, led to the fear that the default of a further 
large player could cause other banks to topple like dominoes. 

The concerted efforts of many governments were successful in forestalling the 
Armageddon scenario. However, since the crisis, research into financial networks 
and their embedded systemic risks has been an important research topic. These 
networks are complex, and as well as banks and insurance companies they contain 
members of a “shadow banking system” of hedge funds and structured investment 
vehicles, which are largely unregulated. One important theme is the identification 
of so-called systemically important financial institutions (SIFI) whose failure might 
cause a systemic crisis. 


1.2.2 The Road to Regulation 


There is no doubt that regulation goes back a long way, at least to the time of the 
Venetian banks and the early insurance enterprises sprouting in London’s coffee 
shops in the eighteenth century. In those days there was more reliance on self- 
regulation or local regulation, but rules were there. However, the key developments 
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that led to the present prudential regulatory framework in financial services are a 
very much more recent story. 

The main aim of modern prudential regulation has been to ensure that financial 
institutions have enough capital to withstand financial shocks and remain solvent. 
Robert Jenkins, a member of the Financial Policy Committee of the Bank of England, 
was quoted in the Independent on 27 April 2012 as saying: 


Capital is there to absorb losses from risks we understand and risks we 
may not understand. Evidence suggests that neither risk-takers nor their 
regulators fully understand the risks that banks sometimes take. That’s 
why banks need an appropriate level of loss absorbing equity. 


Much of the regulatory drive originated from the Basel Committee of Banking 
Supervision. This committee was established by the central-bank governors of the 
Group of Ten at the end of 1974. The Group of Ten is made up of (oddly) eleven 
industrial countries that consult and cooperate on economic, monetary and financial 
matters. The Basel Committee does not possess any formal supranational supervis- 
ing authority, and hence its conclusions do not have legal force. Rather, it formulates 
broad supervisory standards and guidelines and recommends statements of best prac- 
tice in the expectation that individual authorities will take steps to implement them 
through detailed arrangements—statutory or otherwise—that are best suited to their 
own national system. The summary below is brief. Interested readers can consult, 
for example, Tarullo (2008) for further details, and should also see this chapter’s 
Notes and Comments section. 


The first Basel Accord. The first Basel Accord on Banking Supervision (Basel I, 
from 1988) took an important step towards an international minimum capital stan- 
dard. Its main emphasis was on credit risk, by then clearly the most important source 
of risk in the banking industry. In hindsight, however, Basel I took an approach that 
was fairly coarse and measured risk in an insufficiently differentiated way. In measur- 
ing credit risk, claims were divided into three crude categories according to whether 
the counterparties were governments, regulated banks or others. For instance, the 
risk capital charge for a loan to a corporate borrower was five times higher than for 
a loan to an Organisation for Economic Co-operation and Development (OECD) 
bank. The risk weighting for all corporate borrowers was identical, independent of 
their credit rating. The treatment of derivatives was also considered unsatisfactory. 


The birth of VaR. In 1993 the G-30 (an influential international body consisting 
of senior representatives of the private and public sectors and academia) published 
a seminal report addressing, for the first time, so-called off-balance-sheet products, 
like derivatives, in a systematic way. Around the same time, the banking industry 
clearly saw the need for proper measurement of the risks stemming from these new 
products. At JPMorgan, for instance, the famous Weatherstone 4.15 report asked 
for a one-day, one-page summary of the bank’s market risk to be delivered to the 
chief executive officer in the late afternoon (hence “4.15”’). Value-at-risk (VaR) as 
a market risk measure was born and the JPMorgan methodology, which became 
known as RiskMetrics, set an industry-wide standard. 
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In a highly dynamic world with round-the-clock market activity, the need for 
instant market valuation of trading positions (known as marking-to-market) became 
anecessity. Moreover, in markets where so many positions (both long and short) were 
written on the same underlyings, managing risks based on simple aggregation of 
nominal positions became unsatisfactory. Banks pushed to be allowed to consider 
netting effects, i.e. the compensation of long versus short positions on the same 
underlying. 

In 1996 an important amendment to Basel I prescribed a so-called standardized 
model for market risk, but at the same time allowed the bigger (more sophisticated) 
banks to opt for an internal VaR-based model (i.e. a model developed in house). 
Legal implementation was to be achieved by the year 2000. The coarseness problem 
for credit risk remained unresolved and banks continued to claim that they were not 
given enough incentives to diversify credit portfolios and that the regulatory capital 
rules currently in place were far too risk insensitive. Because of overcharging on 
the regulatory capital side of certain credit positions, banks started shifting business 
away from certain market segments that they perceived as offering a less attractive 
risk-return profile. 


The second Basel Accord. By 2001 a consultative process for a new Basel Accord 
(Basel II) had been initiated; the basic document was published in June 2004. An 
important aspect was the establishment of the three-pillar system of regulation: 
Pillar 1 concerns the quantification of regulatory capital; Pillar 2 imposes regulatory 
oversight of the modelling process, including risks not considered in Pillar 1; and 
Pillar 3 defines a comprehensive set of disclosure requirements. 

Under Pillar 1 the main theme of Basel II was credit risk, where the aim was to 
allow banks to use a finer, more risk-sensitive approach to assessing the risk of their 
credit portfolios. Banks could opt for an internal-ratings-based approach, which 
permitted the use of internal or external credit-rating systems wherever appropriate. 

The second important theme of Basel I at the level of Pillar 1 was the consideration 
of operational risk as a new risk class. A basic premise of Basel II was that the overall 
size of regulatory capital throughout the industry should stay unchanged under the 
new rules. Since the new rules for credit risk were likely to reduce the credit risk 
charge, this opened the door for operational risk, defined as the risk of losses resulting 
from inadequate or failed internal processes, people and systems or from external 
events; this definition included legal risk but excluded reputational and strategic 
risk. 

Mainly due to the financial crisis of 2007-9, implementation of the Basel II 
guidelines across the globe met with delays and was rather spread out in time. Various 
further amendments and additions to the content of the original 2004 document were 
made. One important criticism of Basel II that emerged from the crisis was that it 
was inherently procyclical, in that it forced firms to take action to increase their 
capital ratios at exactly the wrong point in the business cycle, when their actions 
had a negative impact on the availability of liquidity and made the situation worse 
(see Section 1.3.3 for more discussion on this). 
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Basel2.5. One clear lesson from the crisis was that modern products like CDOs had 
opened up opportunities for regulatory arbitrage by transferring credit risk from the 
capital-intensive banking book (or loan book) to the less-capitalized trading book. 
Some enhancements to Basel II were proposed in 2009 with the aim of addressing 
the build-up of risk in the trading book that was evident during the crisis. These 
enhancements, which have come to be known as Basel 2.5, include a stressed VaR 
charge, based on calculating VaR from data for a twelve-month period of market 
turmoil, and the so-called incremental risk charge, which seeks to capture some 
of the default risk in trading book positions; there were also specific new rules for 
certain securitizations. 


The third Basel Accord. In view of the failure of the Basel rules to prevent the 
2007-9 crisis, the recognized deficiencies of Basel II mentioned above, and the 
clamour from the public and from politicians for regulatory action to make banks 
and the banking system safer, it is no surprise that attention quickly shifted to 
Basel III. 

In 2011 a series of measures was proposed that would extend Basel II (and 2.5) 
in five main areas: 


(1) measures to increase the quality and amount of bank capital by changing the 
definition of key capital ratios and allowing countercyclical adjustments to 
these ratios in crises; 


(2) a strengthening of the framework for counterparty credit risk in derivatives 
trading, with incentives to use central counterparties (exchanges); 


(3) the introduction of a leverage ratio to prevent excessive leverage; 


(4) the introduction of various ratios that ensure that banks have sufficient funding 
liquidity; 

(5) measures to force systemically important banks to have even higher capacity 
to absorb losses. 


Most of the new rules will be phased in progressively, with a target end date of 
2019, although individual countries may impose stricter guidelines with respect to 
both schedule and content. 


Parallel developments in insurance regulation. The insurance industry worldwide 
has also been subject to increasing risk regulation in recent times. However, here the 
story is more fragmented and there has been much less international coordination of 
efforts. The major exception has been the development of the Solvency II framework 
in the European Union, a process described in more detail below. As the most 
detailed and model intensive of the regulatory frameworks proposed, it serves as 
our main reference point for insurance regulation in this book. The development of 
the Solvency II framework is overseen by the European Insurance and Occupational 
Pensions Authority (EIOPA; formerly the Committee of European Insurance and 
Occupational Pensions Supervisors (CEIOPS)), but the implementation in individual 
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countries is a matter for national regulators, e.g. the Prudential Regulatory Authority 
in the UK. 

In the US, insurance regulation has traditionally been a matter for state gov- 
ernments. The National Association of Insurance Commissioners (NAIC) provides 
support to insurance regulators from the individual states, and helps to promote the 
development of accepted regulatory standards and best practices; it is up to the indi- 
vidual states whether these are passed into law, and if so in what form. In the early 
1990s the NAIC promoted the concept of risk-based capital for insurance compa- 
nies as a response to a number of insolvencies in the preceding years; the NAIC 
describes risk-based capital as “a method of measuring the minimum amount of 
capital appropriate for a reporting entity to support its overall business operations in 
consideration of its size and profile”. The method, which is a rules-based approach 
rather than a model-based approach, has become the main plank of insurance regu- 
lation in the US. 

Federal encroachment on insurance supervision has generally been resisted, 
although this may change due to a number of measures enacted after the 2007-9 
crisis in the wide-ranging 2010 Dodd—Frank Act. These include the creation of both 
the Federal Insurance Office, to “monitor all aspects of the insurance sector”, and 
the Financial Stability Oversight Council, which is “charged with identifying risks 
to the financial stability of the United States” wherever they may arise in the world 
of financial services. 

The International Association of Insurance Supervisors has been working to foster 
some degree of international convergence in the processes for regulating the capital 
adequacy of insurers. They have promoted the idea of the Own Risk and Solvency 
Assessment (ORSA). This has been incorporated into the Solvency II framework 
and has also been embraced by the NAIC in the US. 

There are also ongoing initiatives that aim to bring about convergence of bank- 
ing and insurance regulation, particularly with respect to financial conglomerates 
engaged in both banking and insurance business. The Joint Forum on Financial 
Conglomerates was established in early 1996 under the aegis of the Basel Com- 
mittee, the International Association of Insurance Supervisors and the International 
Organization of Securities Commissions to take forward this work. 


From Solvency I to Solvency II. Mirroring the progress in the banking sector, 
Solvency II is the latest stage in a process of regulatory evolution from simple and 
crude rules to a more risk-sensitive treatment of the capital requirements of insurance 
companies. 

The first European Union non-life and life directives on solvency margins 
appeared around 1970. The solvency margin was defined as an extra capital buffer 
against unforeseen events such as higher than expected claims levels or unfavourable 
investment results. However, there were differences in the way that regulation was 
applied across Europe and there was a desire for more harmonization of regulation 
and mutual recognition. 

Solvency I, which came into force in 2004, is a rather coarse rules-based frame- 
work calling for companies to have a minimum guarantee fund (minimal capital) 
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of €3 million, and a solvency margin consisting of 16-18% of non-life premi- 
ums together with 4% of the technical provisions for life. This has led to a single 
robust system that is easy to understand and inexpensive to monitor. However, on 
the negative side, it is mainly volume based, not explicitly risk based; issues like 
guarantees, embedded options and the proper matching of assets and liabilities are 
largely neglected in many countries. 

To address these shortcomings, Solvency II was initiated in 2001 with the publica- 
tion of the influential Sharma Report. While the Solvency II directive was adopted 
by the Council of the European Union and the European Parliament in Novem- 
ber 2009, implementation of the framework is not expected until 1 January 2016. 
The process of refinement of the framework is managed by EIOPA, and one of the 
features of this process has been a series of quantitative impact studies in which 
companies have effectively tried out aspects of the proposals and information has 
been gathered with respect to the impact and practicability of the new regulations. 

The goal of the Solvency II process is that the new framework should strengthen 
the capital adequacy regime by reducing the possibilities of consumer loss or market 
disruption in insurance; Solvency II therefore has both policyholder-protection and 
financial-stability motives. Moreover, it is also an aim that the harmonization of 
regulation in Europe should promote deeper integration of the European Union 
insurance market and the increased competitiveness of European insurers. A high- 
level description of the Solvency II framework is given in Section 1.3.2. 


The Swiss Solvency Test (SST). Special mention should be made of Switzerland, 
which has already developed and implemented its own principles-based risk capital 
regulation for the insurance industry. The SST has been in force since 1 January 
2011. It follows similar principles to Solvency II but differs in some details of its 
treatment of different types of risk; it also places more emphasis on the development 
of internal models. The implementation of the SST falls under the remit of the Swiss 
Financial Markets Supervisory Authority, a body formed in 2007 from the merger 
of the banking and insurance supervisors, which has statutory authority over banks, 
insurers, stock exchanges, collective investment schemes and other entities. 


1.3 The Regulatory Framework 


This section describes in more detail the framework that has emerged from the Basel 
process and the European Union solvency process. 


1.3.1 The Basel Framework 


As indicated in Section 1.2.2, the Basel framework should be regarded as the product 
of an evolutionary process. As this book goes to press, the Basel II and Basel 2.5 
proposals have been implemented in many developed countries (with some varia- 
tions in detail), while the proposals of Basel II are still being debated and refined. 
We sketch the framework as currently implemented, before indicating some of the 
proposed changes and additions to the framework in Basel II. 
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The three-pillar concept. A key feature of the Basel framework is the three-pillar 
concept, as is apparent from the following statement summarizing the Basel phi- 
losophy, which accompanied the original Basel II publication (Basel Committee on 
Banking Supervision 2004): 


The Basel II Framework sets out the details for adopting more risk- 
sensitive minimum capital requirements [Pillar 1] for banking orga- 
nizations. The new framework reinforces these risk-sensitive require- 
ments by laying out principles for banks to assess the adequacy of their 
capital and for supervisors to review such assessments to ensure banks 
have adequate capital to support their risks [Pillar 2]. It also seeks to 
strengthen market discipline by enhancing transparency in banks’ finan- 
cial reporting [Pillar 3]. The text that has been released today reflects the 
results of extensive consultations with supervisors and bankers world- 
wide. It will serve as the basis for national rule-making and approval 
processes to continue and for banking organizations to complete their 
preparations for the new Framework’s implementation. 


Under Pillar 1, banks are required to calculate a minimum capital charge, referred 
to as regulatory capital. There are separate Pillar 1 capital charges for credit risk in 
the banking book, market risk in the trading book and operational risk, which are 
considered to be the main quantifiable risks. Most banks use internal models based 
on VaR methodology to compute the capital charge for market risk. For credit risk 
and operational risk banks may choose between several approaches of increasing 
risk sensitivity and complexity, some details of which are discussed below. 

Pillar 2 recognizes that any quantitative approach to risk management should be 
embedded in a properly functioning corporate governance structure. Best-practice 
risk management imposes constraints on the organization of the institution, i.e. the 
board of directors, management, employees, and internal and external audit pro- 
cesses. In particular, the board of directors assumes the ultimate responsibility for 
oversight of the risk landscape and the formulation of the company’s risk appetite. 
Through Pillar 2, also referred to as the supervisory review process, local regulators 
review the various checks and balances that have been put in place. Under Pillar 2, 
residual quantifiable risks that are not included in Pillar 1, such as interest-rate risk 
in the banking book, must be considered and stress tests of a bank’s capital adequacy 
must be performed. The aim is to ensure that the bank holds capital in line with its 
true economic loss potential, a concept known as economic capital. 

Finally, in order to fulfil its promise that increased regulation will increase trans- 
parency and diminish systemic risk, clear reporting guidelines on the risks carried 
by financial institutions are called for. Pillar 3 seeks to establish market discipline 
through a better public disclosure of risk measures and other information relevant 
to risk management. In particular, banks will have to offer greater insight into the 
adequacy of their capitalization. 


Credit and market risk; the banking and trading books. Historically, banking activ- 
ities have been organized around the banking book and the trading book, a split that 
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reflects different accounting practices for different kinds of assets. The banking book 
contains assets that are held to maturity, such as loans; these are typically valued at 
book value, based on the original cost of the asset. The trading book contains assets 
and instruments that are available to trade; these are generally valued by marking- 
to-market (i.e. using quoted market prices). From a regulatory point of view, credit 
risk is mainly identified with the banking book and market risk is mainly identified 
with the trading book. 

We have already noted that there are problems with this simple dichotomy and 
that the Basel 2.5 rules were introduced (partly) to account for the neglect of credit 
risk (default and rating-migration risk) in the trading book. There are also forms of 
market risk in the banking book, such as interest-rate risk and foreign-exchange risk. 
However, the Basel framework continues to observe the distinction between banking 
book and trading book and we will describe the capital charges in terms of the two 
books. It is clear that the distinction is somewhat arbitrary and rests on the concept 
of “available to trade”. Moreover, there can be incentives to “switch” or move instru- 
ments from one book to the other (particularly from the banking book to the trading 
book) to benefit from a more favourable capital treatment. This is acknowledged by 
the Basel Committee in its background discussion of the “Fundamental review of 
the trading book: a revised market risk framework” (Basel Committee on Banking 
Supervision 2013a): 


The Committee believes that the definition of the regulatory boundary 
between the trading book and the banking book has been a source of 
weakness in the design of the current regime. A key determinant of the 
boundary has been banks’ self-determined intent to trade.... Coupled 
with large differences in capital requirements against similar types of 
risk on either side of the boundary, the overall capital framework proved 
susceptible to arbitrage before and during the crisis.... To reduce the 
incentives for arbitrage, the Committee is seeking a less permeable 
boundary with strict limits on switching between books and measures 
to prevent “capital benefit” in instances where switching is permitted. 


The capital charge for the banking book. The credit risk of the banking book port- 
folio is assessed as the sum of risk-weighted assets: that is, the sum of notional 
exposures weighted by a coefficient reflecting the creditworthiness of the counter- 
party (the risk weight). To calculate risk weights, banks use either the standardized 
approach or one of the more advanced internal-ratings-based (IRB) approaches. 
The choice of method depends on the size and complexity of the bank, with the 
larger, international banks having to go for IRB approaches. The capital charge is 
determined as a fraction of the sum of risk-weighted assets in the portfolio. This 
fraction, known as the capital ratio, was 8% under Basel II but is already being 
increased ahead of the planned implementation of Basel II in 2019. 

The standardized approach refers to a system that has been in place since Basel I, 
whereby the risk weights are prescribed by the regulator according to the nature 
and creditworthiness of the counterparty. For example, there are risk weights for 
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retail loans secured on property (mortgages) and for unsecured retail loans (such as 
credit cards and overdrafts); there are also different risk weights for corporate and 
government bonds with different ratings. 

Under the more advanced IRB approaches, banks may dispense with the system 
of fixed risk weights provided by the regulator. Instead, they may make an internal 
assessment of the riskiness of a credit exposure, expressing this in terms of an esti- 
mated annualized probability of default and an estimated loss given default, which 
are used as inputs in the calculation of risk-weighted assets. The total sum of risk- 
weighted assets is calculated using formulas specified by the Basel Committee; the 
formulas also take into account the fact that there is likely to be positive correlation 
(sometimes called systematic risk) between the credit risks in the portfolio. The use 
of internally estimated probabilities of default and losses given default allows for 
increased risk sensitivity in the IRB capital charges compared with the standard- 
ized approach. It should be noted, however, that the IRB approaches do not permit 
fully internal models of credit risk in the banking book; they only permit internal 
estimation of inputs to a model that has been specified by the regulator. 


The capital charge for the trading book. For market risk in the trading book there 
is also the option of a standardized approach based on a system of risk weights and 
specific capital charges for different kinds of instrument. However, most major banks 
elect to use an internal VaR model approach, as permitted by the 1996 amendment 
to Basel I. In Sections 2.2 and 9.2 of this book we give a detailed description of 
the VaR approach to trading book risk measurement. The approach is based on the 
estimation of a P&L distribution for a ten-day holding period and the estimation of 
a particular percentile of this distribution: the 99th percentile of the losses. 

A ten-day VaR at 99% of $20 million therefore means that it is estimated that our 
market portfolio will incur a loss of $20 million or more with probability 1% by the 
end of a ten-day holding period, if the composition remains fixed over this period. 
The conversion of VaR numbers into an actual capital charge is accomplished by a 
formula that we discuss in Section 2.3.3. 

The VaR calculation is the main component of risk quantification for the trading 
book, but the 2009 Basel 2.5 revision added further elements (see Basel Committee 
on Banking Supervision 2012, p. 10), including the following. 


Stressed VaR: banks are required to carry out a VaR calculation essentially using 
the standard VaR methodology but calibrating their models to a historical twelve- 
month period of significant financial stress. 


Incremental risk charge: Since default and rating-migration risk are not generally 
considered in the standard VaR calculation, banks must calculate an additional 
charge based on an estimate of the 99.9th percentile of the one-year distribution 
of losses due to defaults and rating changes. In making this calculation they may 
use internal models for credit risk (in contrast to the banking book) but must also 
take into account the market liquidity of credit-risky instruments. 
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Securitizations: exposures to securitizations in the trading book are subject to 
a series of new capital charges that bring them more into line with equivalent 
exposures in the banking book. 


The capital charge for operational risk. There are also options of increasing 
sophistication for assessing operational risk. Under the basic-indicator and stan- 
dardized approaches, banks may calculate their operational risk charge using sim- 
ple formulas based on gross annual income. Under the advanced measurement 
approach, banks may develop internal models. Basel is not prescriptive about the 
form of these models provided they capture the tail risk of extreme events; most such 
models are based on historical loss data (internal and external to the firm) and use 
techniques that are drawn from the actuarial modelling of general insurance losses. 
We provide more detail in Chapter 13. 


New elements of Basel III. Under Basel III there will be a number of significant 
changes and additions to the Basel framework. While the detail of the new rules 
may change before final implementation in 2019, the main developments are now 
clear. 


e Banks will need to hold both more capital and better-quality capital as a 
function of the risks taken. The “better quality” is achieved though a more 
restrictive definition of eligible capital (through more stringent definitions of 
Tier | and Tier 2 capital and the phasing out of Tier 3 capital); see Section 2.1.3 
for more explanation of capital tiers. The “more” comes from the addition (on 
top of the minimum ratio of 8%) of a capital conservation buffer of 2.5% of 
risk-weighted assets, for building up capital in good times to absorb losses 
under stress, and a countercyclical buffer within the range 0-2.5%, in order 
to enhance the shock resilience of banks and limit expansion in periods of 
excessive credit growth. This leads to a total (Tier 1 plus Tier 2) ratio of up 
to 13%, compared with Basel II’s 8%. There will be a gradual phasing in of 
all these new ratios, with a target date for full implementation of 1 January 
2019. 


e A leverage ratio will be imposed to put a floor under the build-up of excessive 
leverage in the banking system. Leverage will essentially be measured through 
the ratio of Tier 1 capital to total assets. A minimum ratio of 3% is currently 
being tested but the precise definitions may well change as a result of testing 
experience and bank lobbying. The leverage limit will restrain the size of bank 
assets, regardless of their riskiness. 


e The risk coverage of the system of capital charges is being extended, in partic- 
ular to include a charge for counterparty credit risk. When counterparty credit 
risk is taken into account in the valuation of over-the-counter derivatives con- 
tract, the default-risk-free value has to be adjusted by an amount known as the 
credit value adjustment (CVA); see Section 17.2 for more explanation. There 
will now be a charge for changes in CVA. 
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e Banks will become subject to liquidity rules; this is a completely new direction 
for the Basel framework, which has previously been concerned only with 
capital adequacy. A liquidity coverage ratio will be introduced to ensure that 
banks have enough highly liquid assets to withstand a period of net cash 
outflow lasting thirty days. A net stable funding ratio will ensure that sufficient 
funding is available in order to cover long-term commitments (exceeding one 


year). 


It should also be mentioned that under an ongoing review of the trading book, the 
principle of risk quantification may change from one based on VaR (a percentile) 
to one based on expected shortfall (ES). For a given holding period, the ES at the 
99% level, say, is the expected loss given that the loss is higher than the VaR at the 
99% level over the same period. ES is a severity measure that always dominates 
the frequency measure VaR and gives information about the expected size of tail 
losses; it is also a measure with superior aggregation properties to VaR, as discussed 
in Section 2.3.5 and Chapter 8 (particularly Sections 8.1 and 8.4.4). 


1.3.2 The Solvency II Framework 


Below we give an outline of the Solvency II framework, which will come into force 
in the countries of the European Union on or before 1 January 2016. 


Main features. In common with the Basel Accords, Solvency II adopts a three- 
pillar system, where the first pillar requires the quantification of regulatory capital 
requirements, the second pillar is concerned with governance and supervision, and 
the third pillar requires the disclosure of information to the public to improve market 
discipline by making it easier to compare the risk profiles of companies. 

Under Pillar 1, a company calculates its solvency capital requirement, which is 
the amount of capital it should have to ensure that the probability of insolvency over 
a one-year period is no more than 0.5%—this is often referred to as a confidence 
level of 99.5%. The company also calculates a smaller minimum capital require- 
ment, which is the minimum capital it should have to continue operating without 
supervisory intervention. 

To calculate the capital requirements, companies may use either an internal model 
or a simpler standard formula approach. In either case the intention is that a total 
balance sheet approach is taken in which all risks and their interactions are con- 
sidered. The insurer should have own funds (a surplus of assets over liabilities) that 
exceed both the solvency capital requirement and the minimum capital requirement. 
The assets and liabilities of the firm should be valued in a market-consistent manner. 

The supervisory review of the company takes place under Pillar 2. The company 
must demonstrate that it has a risk-management system in place and that this system 
is integrated into decision-making processes, including the setting of risk appetite 
by the company’s board, and the formulation of risk limits for different business 
units. An internal model must pass the “use test”: it must be an integral part of the 
risk-management system and be actively used in the running of the firm. Moreover, 
a firm must undertake an ORSA as described below. 
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Market-consistent valuation. In Solvency II the valuation must be carried out 
according to market-consistent principles. Where possible it should be based on 
actual market values, in a process known as marking-to-market. In a Solvency II 
glossary provided by the Comité Européen des Assurances and the Groupe Consul- 
tatif in 2007, market value is defined as: 


The amount for which an asset could be exchanged or a liability settled, 
between knowledgeable, willing parties in an arm’s length transaction, 
based on observable prices within an active, deep and liquid market 
which is available to and generally used by the entity. 


The concept of market value is related to the concept of fair value in accounting, 
and the principles adopted in Solvency II valuation have been influenced by Interna- 
tional Financial Reporting Standards (IFRS) accounting standards. When no relevant 
market values exist (or when they do not meet the quality criteria described by the 
concept of an “active, deep and liquid market”), then market-consistent valuation 
requires the use of models that are calibrated, as far as possible, to be consistent 
with financial market information, a process known as marking-to-model; we dis- 
cuss these ideas in more detail in Section 2.2.2. 

The market-consistent valuation of the liabilities of an insurer is possible when the 
cash flows paid to policyholders can be fully replicated by the cash flows generated 
by the so-called matching assets that are held for that purpose; the value of the 
liability is then given by the value of the replicating portfolio of matching assets. 
However, it is seldom the case that liabilities can be fully replicated and hedged; 
mortality risk is a good example of a risk factor that is difficult to hedge. 

The valuation of the unhedgeable part of a firm’s liabilities is carried out by 
computing the sum of a best estimate of these liabilities (basically an expected 
value) plus an extra risk margin to cover some of the uncertainty in the value of the 
liability. The idea of the risk margin is that a third party would not be willing to take 
over the unhedgeable liability for a price set at the best estimate but would have to 
be further compensated for absorbing the additional uncertainty about the true value 
of the liability. 


Standard formula approach. Under this approach an insurer calculates capital 
charges for different kinds of risk within a series of modules. There are modules, 
for example, for market risk, counterparty default risk, life underwriting risk, non- 
life underwriting risk and health insurance risk. The risk charges arising from these 
modules are aggregated to obtain the solvency capital requirement using a formula 
that involves a set of prescribed correlation parameters (see Section 8.4.2). 

Within each module, the approach drills down to fundamental risk factors; for 
example, within the market-risk module, there are sub-modules relating to interest- 
rate risk, equity risk, credit-spread risk and other typical market-risk factors. Capital 
charges are calculated with respect to each risk factor by considering the effect of a 
series of defined stress scenarios on the value of net assets (assets minus liabilities). 
The stress scenarios are intended to represent |-in-200-year events (i.e. events with 
an annual probability of 0.5%). 
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The capital charges for each risk factor are aggregated to obtain the module risk 
charge using a similar kind of formula to the one used at the highest level. Once 
again, a set of correlations expresses the regulatory view of dependencies between 
the effects of the fundamental risk factors. The details are complex and run to many 
pages, but the approach is simple and highly prescriptive. 


Internal-model approach. Under this approach firms can develop an internal model 
for the financial and underwriting risk factors that affect their business; they may 
then seek regulatory approval to use this model in place of the standard formula. The 
model often takes the form of a so-called economic scenario generator in which 
risk-factor scenarios for a one-year period are randomly generated and applied to 
the assets and liabilities to determine the solvency capital requirement. Economic 
scenario generators vary greatly in their detail, ranging from simple distributional 
models to more sophisticated dynamic models in discrete or continuous time. 


ORSA. Ina 2008 Issues Paper produced by CEIOPS, the ORSA is described as 
follows: 


The entirety of the processes and procedures employed to identify, 
assess, monitor, manage, and report the short and long term risks a 
(re)insurance undertaking faces or may face and to determine the own 
funds necessary to ensure that the undertaking’s overall solvency needs 
are met at all times. 


The concept of an ORSA is not unique to Solvency II and a useful alternative 
definition has been provided by the NAIC in the US on its website: 


In essence, an ORSA is an internal process undertaken by an insurer 
or insurance group to assess the adequacy of its risk management and 
current and prospective solvency positions under normal and severe 
stress scenarios. An ORSA will require insurers to analyze all reason- 
ably foreseeable and relevant material risks (i.e., underwriting, credit, 
market, operational, liquidity risks, etc.) that could have an impact on 
an insurer’s ability to meet its policyholder obligations. 


The Pillar 2 ORSA is distinguished from the Pillar 1 capital calculations in a number 
of ways. First, the definition makes clear that the ORSA refers to a process, or set 
of processes, and not simply an exercise in regulatory compliance. Second, each 
firm’s ORSA is its own process and is likely to be unique, since it is not bound 
by a common set of rules. In contrast, the standard-formula approach to Pillar 1 is 
clearly a uniform process for all companies; moreover, firms that seek internal-model 
approval for Pillar 1 are subject to very similar constraints. 

Finally, the ORSA goes beyond the one-year time horizon (which is a limitation 
of Pillar 1) and forces firms to assess solvency over their business planning hori- 
zon, which can mean many years for typical long-term business lines, such as life 
insurance. 
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1.3.3 Criticism of Regulatory Frameworks 


The benefits arising from the regulation of financial services are not generally in 
doubt. Customer-protection acts, responsible corporate governance, fair and com- 
parable accounting rules, transparent information on risk, capital and solvency for 
shareholders and clients are all viewed as positive developments. 

Very few would argue the extreme position that the prudential regulatory frame- 
works we have discussed are not needed; in general, after a crisis, the demand (at 
least from the public and politicians) is for more regulation. Nevertheless, there are 
aspects of the regulatory frameworks that have elicited criticism, as we now discuss. 


Cost and complexity. The cost factor of setting up a well-functioning risk- 
management system compliant with the present regulatory framework is significant, 
especially (in relative terms) for smaller institutions. On 27 March 2013, the Finan- 
cial Times quoted Andrew Bailey (head of the Prudential Regulatory Authority in 
the UK) as saying that Solvency II compliance was set to cost UK companies at 
least £3 billion, a “frankly indefensible” amount. Related to the issue of cost is the 
belief that regulation, in its attempt to become more risk sensitive, is becoming too 
complex; this theme is taken up by the Basel Committee in their 2013 discussion 
paper entitled “The regulatory framework: balancing risk sensitivity, simplicity and 
comparability” (Basel Committee on Banking Supervision 2013b). 


Endogenous risk. In general terms, this refers to the risk that is generated within 
a system and amplified by the system due to feedback effects. Regulation, a feature 
of the system, may be one of the channels by which shocks are amplified. 

Regulation can lead to risk-management herding, whereby institutions following 
similar (perhaps VaR-based) rules may all be “running for the same exit” in times of 
crisis, consequently destabilizing an already precarious situation even further. This 
herding phenomenon has been suggested in connection with the 1987 stock market 
crash and the events surrounding the 1998 LTCM crisis (Danielsson et al. 2001b). 

An even more compelling example was observed during the 2007-9 crisis; to 
comply with regulatory capital ratios in a market where asset values were falling and 
risks increasing, firms adjusted their balance sheets by selling assets, causing further 
asset value falls and vanishing market liquidity. This led to criticism of the inherently 
procyclical nature of the Basel II regulation, whereby capital requirements may rise 
in times of stress and fall in times of expansion; the Basel III proposals attempt to 
address this issue with a countercyclical capital buffer. 


Consequences of fair-value accounting and market-consistent valuation. The 
issue of procyclicality is also related to the widespread use of fair-value accounting 
and market-consistent valuation, which are at the heart of both the Basel rules for the 
trading book and the Solvency II framework. The fact that capital requirements are 
so closely coupled to volatile financial markets has been another focus of criticism. 

An example of this is the debate around the valuation of insurance liabilities in 
periods of market stress. A credit crisis, of the kind experienced in 2007-9, can 
impact the high-quality corporate bonds that insurance companies hold on the asset 
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side of their balance sheets. The relative value of corporate bonds compared with 
safe government bonds can fall sharply as investors demand more compensation for 
taking on both the credit risk and, in particular, the liquidity risk of corporate bonds. 

The effect for insurers is that the value of their assets falls relative to the value 
of their liabilities, since the latter are valued by comparing cash flows with safe 
government bonds. At a particular point in time, an insurer may appear to have 
insufficient capital to meet solvency capital requirements. However, if an insurer has 
matched its asset and liability cash flows and can continue to meet its contractual 
obligations to policyholders, the apparent depletion of capital may not be a problem; 
insurance is a long-term business and the insurer has no short-term need to sell assets 
or offload liabilities, so a loss of capital need not be realized unless some of the bonds 
actually default. 

Regulation that paints an unflattering picture of an insurer’s solvency position 
is not popular with regulated firms. Firms have argued that they should be able to 
value liabilities at a lower level, by comparing the cash flows not with expensive 
government bonds but instead with the corporate bonds that are actually used as 
matching assets, making allowance only for the credit risk in corporate bonds. This 
has given rise to the idea of discounting with an extra illiquidity premium, or match- 
ing premium, above a risk-free rate. There has been much debate about this issue 
between those who feel that such proposals undermine market-consistent valuation 
and those who believe that strict adherence to market-consistent valuation overstates 
risk and has potential systemic consequences (see, for example, Wuthrich 2011). 


Limits to quantification. Further criticism has been levelled at the highly quan- 
titative nature of regulation and the extensive use of mathematical and statistical 
methods. The section on “Misplaced reliance on sophisticated mathematics” in the 
Turner Review of the global banking crisis (Lord Turner 2009) states that: 


The very complexity of the mathematics used to measure and manage 
risk, moreover, made it increasingly difficult for top management and 
boards to assess and exercise judgement over the risk being taken. 
Mathematical sophistication ended up not containing risk, but providing 
false assurances that other prima facie indicators of increasing risk 
(e.g. rapid credit extension and balance sheet growth) could be safely 
ignored. 


This idea that regulation can lead to overconfidence in the quality of statistical 
risk measures is related to the view that the essentially backward-looking nature 
of estimates derived from historical data is a weakness. The use of conventional 
VaR-based methods has been likened to driving a car while looking in the rear-view 
mirror, the idea being that this is of limited use in preparing for the shocks that lie 
ahead. 

The extension of the quantitative approach to operational risk has been contro- 
versial. Whereas everyone agrees that risks such as people risk (e.g. incompetence, 
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fraud), process risk (e.g. model, transaction and operational control risk), technol- 
ogy risk (e.g. system failure, programming error) and legal risk are important, there 
is much disagreement on the extent to which these risks can be measured. 


Limits to the efficacy of regulation. Finally, there is some debate about whether or 
not tighter regulation can ever prevent the occurrence of crises like that of 2007-9. 
The sceptical views of central bankers and regulatory figures were reported in the 
Economist in an article entitled “The inevitability of instability” (25 January 2014) 
(see also Prates 2013). The article suggests that “rules are constantly overtaken by 
financial innovation” and refers to the economist J. K. Galbraith (1993), who wrote: 


All financial innovation involves, in one form or another, the creation 
of debt secured in greater or lesser adequacy by real assets.... All crises 
have involved debt that, in one fashion or another, has become danger- 
ously out of scale in relation to the underlying means of payment. 


Tightening up the capital treatment of securitizations may prevent a recurrence of 
the events surrounding the 2007-9 crisis, but, according to the sceptical view, it will 
not prevent different forms of debt-fuelled crisis in the future. 


1.4 Why Manage Financial Risk? 


An important issue that we have barely touched upon is the reason for investing in 
risk management in the first place. This question can be addressed from various per- 
spectives, including those of the customer of a financial institution, its shareholders, 
its management, its board of directors, regulators, politicians, or the general public; 
each of these stakeholders may have a different view. In the selective account we 
give here, we focus on two viewpoints: that of society as a whole, and that of the 
shareholders (owners) of a firm. 


1.4.1 A Societal View 


Modern society relies on the smooth functioning of banking and insurance systems, 
and it has a collective interest in the stability of such systems. The regulatory pro- 
cess that has given us the Basel and Solvency II frameworks was initially motivated 
by the desire to prevent the insolvency of individual institutions, thus protecting 
customers and policyholders; this is sometimes referred to as a microprudential 
approach. However, the reduction of systemic risk—the danger that problems in a 
single financial institution may spill over and, in extreme situations, disrupt the nor- 
mal functioning of the entire financial system—has become an important secondary 
focus, particularly since the 2007-9 crisis. Regulation therefore now also takes a 
macroprudential perspective. 

Most members of society would probably agree that protection of customers 
against the failure of an individual firm is an important aim, and there would be 
widespread agreement that the promotion of financial stability is vital. However, 
it is not always clear that the two aims are well aligned. While there are clearly 
situations where the failure of one company may lead to spillover effects that result 
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in a systemic crisis, there may also be situations where the long-term interests of 
financial stability are better served by allowing a company to fail: it may provide a 
lesson in the importance of better risk management for other companies. This issue 
is clearly related to the systemic importance of the company in question: in other 
words, to its size and the extent of its connectivity to other firms. But the recognition 
that there may be firms that are too important or are too big to fail creates a moral 
hazard, since the management of such a firm may take more risk in the knowledge 
that the company would be bailed out in a crisis. Of course, it may be the case that 
in some countries some institutions are also too big to save. 

The 2007-9 crisis provided a case study that brought many of these issues to 
the fore. As we noted in our account of the crisis in Section 1.2, it was initially 
believed that the growth in securitization was dispersing credit risk throughout the 
system and was beneficial to financial stability. But the warehousing of vast amounts 
of inadequately capitalized credit risk (in the form of CDOs) in trading books, 
combined with the interconnectedness of banks through derivatives and interbank 
lending activities, meant that quite the opposite was true. The extent of the systemic 
risk that had been accumulating became apparent when Lehman Brothers filed for 
bankruptcy on 15 September 2008 and governments intervened to bail out the banks. 

It was the following phase of the crisis during which society suffered. The world 
economy went into recession, households defaulted on their debts, and savings and 
pensions were hit hard. The crisis moved “from Wall Street to Main Street”. Natu- 
rally, this led to resentment as banking remained a highly rewarded profession and 
it seemed that the government-sponsored bailouts had allowed banks “to privatize 
their gains and socialize their losses”. 

There has been much debate since the crisis on whether the US government could 
have intervened to save Lehman, as it did for other firms such as AIG. In the Financial 
Times on 14 September 2009, the historian Niall Ferguson wrote: 


Like the executed British admiral in Voltaire’s famous phrase, Lehman 
had to die pour encourager les autres—to convince the other banks that 
they needed injections of public capital, and to convince the legislature 
to approve them. Not everything in history is inevitable; contingencies 
abound. Sometimes it is therefore right to say “if only”. But an imag- 
ined rescue of Lehman Brothers is the wrong counterfactual. The right 
one goes like this. If only Lehman’s failure and the passage of TARP 
had been followed—not immediately, but after six months—by a clear 
statement to the surviving banks that none of them was henceforth too 
big to fail, then we might actually have learnt something from this crisis. 


While it is difficult to speak with authority for “society”, the following conclu- 
sions do not seem unreasonable. The interests of society are served by enforcing the 
discipline of risk management in financial firms, through the use of regulation. Better 
risk management can reduce the risk of company failure and protect customers and 
policyholders who stand in a very unequal financial relationship with large firms. 
However, the regulation employed must be designed with care and should not pro- 
mote herding, procyclical behaviour or other forms of endogenous risk that could 
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result in a systemic crisis with far worse implications for society than the failure 
of a single firm. Individual firms need to be allowed to fail on occasion, provided 
customers can be shielded from the worst consequences through appropriate com- 
pensation schemes. A system that allows firms to become too big to fail creates 
moral hazard and should be avoided. 


1.4.2 The Shareholder’s View 


It is widely believed that proper financial risk management can increase the value of 
a corporation and hence shareholder value. In fact, this is the main reason why corpo- 
rations that are not subject to regulation by financial supervisory authorities engage 
in risk-management activities. Understanding the relationship between shareholder 
value and financial risk management also has important implications for the design 
of risk-management systems. Questions to be answered include the following. 


e When does risk management increase the value of a firm, and which risks 
should be managed? 


e How should risk-management concerns factor into investment policy and 
capital budgeting? 


There is a rather extensive corporate-finance literature on the issue of “corporate 
risk management and shareholder value”. We briefly discuss some of the main 
arguments. In this way we hope to alert the reader to the fact that there is more to 
risk management than the mainly technical questions related to the implementation 
of risk-management strategies dealt with in the core of this book. 

The first thing to note is that from a corporate-finance perspective it is by no means 
obvious that in a world with perfect capital markets risk management enhances 
shareholder value: while individual investors are typically risk averse and should 
therefore manage the risk in their portfolios, it is not clear that risk management or 
risk reduction at the corporate level, such as hedging a foreign-currency exposure 
or holding a certain amount of risk capital, increases the value of a corporation. The 
rationale for this (at first surprising) observation is simple: if investors have access 
to perfect capital markets, they can do the risk-management transactions they deem 
necessary via their own trading and diversification. The following statement from the 
chief investment officer of an insurance company exemplifies this line of reasoning: 
“Tf our shareholders believe that our investment portfolio is too risky, they should 
short futures on major stock market indices.” 

The potential irrelevance of corporate risk management for the value of a cor- 
poration is an immediate consequence of the famous Modigliani—Miller Theorem 
(Modigliani and Miller 1958). This result, which marks the beginning of modern 
corporate-finance theory, states that, in an ideal world without taxes, bankruptcy 
costs and informational asymmetries, and with frictionless and arbitrage-free cap- 
ital markets, the financial structure of a firm, and hence also its risk-management 
decisions, are irrelevant when assessing the firm’s value. Hence, in order to find 
reasons for corporate risk management, one has to “turn the Modigliani—Miller 
Theorem upside down” and identify situations where risk management enhances 
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the value of a firm by deviating from the unrealistically strong assumptions of the 
theorem. This leads to the following rationales for risk management. 


e Risk management can reduce tax costs. Under a typical tax regime the amount 
of tax to be paid by a corporation is a convex function of its profits; by reducing 
the variability in a firm’s cash flow, risk management can therefore lead to a 
higher expected after-tax profit. 


e Risk management can be beneficial, since a company may (and usually will) 
have better access to capital markets than individual investors. 


e Risk management can increase firm value in the presence of bankruptcy costs, 
as it makes bankruptcy less likely. 


e Risk management can reduce the impact of costly external financing on the 
firm value, as it facilitates the achievement of optimal investment. 


The last two points merit a more detailed discussion. Bankruptcy costs consist of 
direct bankruptcy costs, such as the cost of lawsuits, and the more important indirect 
bankruptcy costs. The latter may include liquidation costs, which can be substantial 
in the case of intangibles like research and development and knowhow. This is why 
high research and development spending appears to be positively correlated with the 
use of risk-management techniques. Moreover, increased likelihood of bankruptcy 
often has a negative effect on key employees, management and customer relations, 
in particular in areas where a client wants a long-term business relationship. For 
instance, few customers would want to enter into a life insurance contract with 
an insurance company that is known to be close to bankruptcy. On a related note, 
banks that are close to bankruptcy might be faced with the unpalatable prospect of 
a bank run, where depositors try to withdraw their money simultaneously. A further 
discussion of these issues is given in Altman (1993). 

It is a “stylized fact” of corporate finance that for a corporation, external funds are 
more costly to obtain than internal funds, an observation which is usually attributed 
to problems of asymmetric information between the management of a corporation 
and bond and equity investors. For instance, raising external capital from outsiders 
by issuing new shares might be costly if the new investors, who have incomplete 
information about the economic prospects of a firm, interpret the share issue as a sign 
that the firm is overvalued. This can generate a rationale for risk management for the 
following reason: without risk management the increased variability of a company’s 
cash flow will be translated either into an increased variability of the funds that need 
to be raised externally or to an increased variability in the amount of investment. 
With increasing marginal costs of raising external capital and decreasing marginal 
profits from new investment, we are left with a decrease in (expected) profits. Proper 
risk management, which amounts to a smoothing of the cash flow generated by a 
corporation, can therefore be beneficial. For references to the literature see Notes 
and Comments below. 
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1.5 Quantitative Risk Management 


The aim of this chapter has been to place QRM in a larger historical, regulatory and 
even societal framework, since a study of QRM without a discussion of its proper 
setting and motivation makes little sense. In the remainder of the book we adopt 
a somewhat narrower view and treat QRM as a quantitative science that uses the 
language of mathematics in general, and of probability and statistics in particular. 

In this section we discuss the relevance of the Q in QRM, describe the quantitative 
modelling challenge that we have attempted to meet in this book, and end with 
thoughts on where QRM may lead in the future. 


1.5.1 The Q in QRM 


In Section 1.2.1 we discussed the view that the use of advanced mathematical mod- 
elling and valuation techniques has been a contributory factor in financial crises, 
particularly those attributed to derivative products, such as CDOs in the 2007-9 
crisis. We have also referred to criticism of the quantitative, statistical emphasis 
of the modern regulatory framework in Section 1.3.3. These arguments must be 
taken seriously, but we believe that it is neither possible nor desirable to remove the 
quantitative element from risk management. 

Mathematics and statistics provide us with a suitable language and appropriate 
concepts for describing financial risk. This is clear for complex financial products 
such as derivatives, which cannot be valued and handled without mathematical 
models. But the need for quantitative modelling also arises for simpler products, 
such as a book of mortgages for retail clients. The main risk in managing such a book 
is the occurrence of disproportionately many defaults: a risk that is directly related 
to the dependence between defaults (see Chapter 11 for details). In order to describe 
this dependence, we need mathematical concepts from multivariate statistics, such 
as correlations or copulas; if we want to carry out a simulation study of the behaviour 
of the portfolio under different economic scenarios, we need a mathematical model 
that describes the joint distribution of default events; if the portfolio is large, we 
will also need advanced simulation techniques to generate the relevant scenarios 
efficiently. 

Moreover, mathematical and statistical methods can do better than they did in the 
2007-9 crisis. In fact, providing concepts, techniques and tools that address some 
of the weaker points of current methodology is a main theme of our text and we 
come back to this point in the next section. 

There is a view that, instead of using mathematical models, there is more to 
be learned about risk management through a qualitative analysis of historical case 
studies and the formulation of narratives. What is often overlooked by the non- 
specialist is that mathematical models are themselves nothing more than narratives, 
albeit narratives couched in a precise symbolic language. Addressing the question 
“What is mathematics?”, Gale and Shapley (1962) wrote: “Any argument which is 
carried out with sufficient precision is mathematical.” Lloyd Shapley went on to win 
the 2012 Nobel Memorial Prize in Economic Science. 
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It is certainly true that mathematical methods can be misused. Mathematicians 
are very well aware that a mathematical result has not only a conclusion but, equally 
importantly, certain conditions under which it holds. Statisticians are well aware 
that inductive reasoning on the basis of models relies on the assumption that these 
conditions hold in the real world. This is especially true in economics, which as a 
social science is concerned with phenomena that are not easily described by clear 
mathematical or physical laws. By starting with questionable assumptions, mod- 
els can be used (or manipulated) to deliver bad answers. In a talk on 20 March 
2009, the economist Roger Guesnerie said, “For this crisis, mathematicians are 
innocent... and this in both meanings of the word.” The implication is that quanti- 
tative risk managers must become more worldly about the ways in which models are 
used. But equally, the regulatory system needs to be more vigilant about the ways 
in which models can be gamed and the institutional pressures that can circumvent 
the best intentions of prudent quantitative risk managers. 

We are firmly of the opinion—an opinion that has only been reinforced by our 
study of financial crises—that the Q in QRM is an essential part of the process. We 
reject the idea that the Q is part of the problem, and we believe that it remains (if 
applied correctly and honestly) a part of the solution to managing risk. In summary, 
we strongly agree with Shreve (2008), who said: 


Don’t blame the quants. Hire good ones instead and listen to them. 


1.5.2 The Nature of the Challenge 


When we began this book project we set ourselves the task of defining a new dis- 
cipline of QRM. Our approach to this task has had two main strands. On the one 
hand, we have attempted to put current practice onto a firmer mathematical footing, 
where, for example, concepts like P&L distributions, risk factors, risk measures, 
capital allocation and risk aggregation are given formal definitions and a consistent 
notation. In doing this we have been guided by the consideration of what topics 
should form the core of a course on QRM for a wide audience of students inter- 
ested in risk-management issues; nonetheless, the list is far from complete and will 
continue to evolve as the discipline matures. On the other hand, the second strand 
of our endeavour has been to put together material on techniques and tools that go 
beyond current practice and address some of the deficiencies that have been repeat- 
edly raised by critics. In the following paragraphs we elaborate on some of these 
issues. 


Extremes matter. A very important challenge in QRM, and one that makes it par- 
ticularly interesting as a field for probability and statistics, is the need to address 
unexpected, abnormal or extreme outcomes, rather than the expected, normal or 
average outcomes that are the focus of many classical applications. This is in tune 
with the regulatory view expressed by Alan Greenspan in 1995 at the Joint Central 
Bank Research Conference: 


From the point of view of the risk manager, inappropriate use of the 
normal distribution can lead to an understatement of risk, which must be 


36 1. Risk in Perspective 


balanced against the significant advantage of simplification. From the 
central bank’s corner, the consequences are even more serious because 
we often need to concentrate on the left tail of the distribution in for- 
mulating lender-of-last-resort policies. Improving the characterization 
of the distribution of extreme values is of paramount importance. 


While the quote is older, the same concern about underestimation of extremes is 
raised in a passage in the Turner Review (Lord Turner 2009): 


Price movements during the crisis have often been of a size whose 
probability was calculated by models (even using longer-term inputs) 
to be almost infinitesimally small. This suggests that the models sys- 
tematically underestimated the chances of small probability high impact 
events.... It is possible that financial market movements are inherently 
characterized by fat-tail distributions. VaR models need to be buttressed 
by the application of stress test techniques which consider the impact 
of extreme movements beyond those which the model suggests are at 
all probable. 


Much space in our book is devoted to models for financial risk factors that go beyond 
the normal (or Gaussian) model and attempt to capture the related phenomena of 
heavy or fat tails, excess volatility and extreme values. 


The interdependence and concentration of risks. A further important challenge is 
presented by the multivariate nature of risk. Whether we look at market risk or credit 
risk, or overall enterprise-wide risk, we are generally interested in some form of 
aggregate risk that depends on high-dimensional vectors of underlying risk factors, 
such as individual asset values in market risk or credit spreads and counterparty 
default indicators in credit risk. 

A particular concern in our multivariate modelling is the phenomenon of depend- 
ence between extreme outcomes, when many risk factors move against us simulta- 
neously. In connection with the LTCM case (see Section 1.2.1) we find the following 
quote in Business Week (September 1998): 


Extreme, synchronized rises and falls in financial markets occur infre- 
quently but they do occur. The problem with the models is that they did 
not assign a high enough chance of occurrence to the scenario in which 
many things go wrong at the same time—the “perfect storm” scenario. 


In a perfect storm scenario the risk manager discovers that portfolio diversification 
arguments break down and there is much more of a concentration of risk than had 
been imagined. This was very much the case with the 2007-9 crisis: when borrowing 
rates rose, bond markets fell sharply, liquidity disappeared and many other asset 
classes declined in value, with only a few exceptions (such as precious metals and 
agricultural land), a perfect storm was created. 

We have mentioned (see Section 1.2.1) the notorious role of the Gauss copula in 
the 2007-9 financial crisis. An April 2009 article in the Economist, with the title 
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“In defence of the Gaussian copula’, evokes the environment at the time of the 
securitization boom: 


By 2001, correlation was a big deal. A new fervour was gripping Wall 
Street—one almost as revolutionary as that which had struck when the 
Black-Scholes model brought about the explosion in stock options and 
derivatives in the early 1980s. This was structured finance, the cul- 
mination of two decades of quants on Wall Street.... The problem, 
however, was correlation. The one thing any off-balance-sheet securi- 
tisation could not properly capture was the interrelatedness of all the 
hundreds of thousands of different mortgage loans they owned. 


The Gauss copula appeared to solve this problem by offering a model for the cor- 
related times of default of the loans or other credit-risky assets; the perils of this 
approach later became clear. In fact, the Gauss copula is not an example of the use 
of oversophisticated mathematics; it is a relatively simple model that is difficult 
to calibrate reliably to available market information. The modelling of dependent 
credit risks, and the issue of model risk in that context, is a subject we look at in 
some detail in our treatment of credit risk. 


The problem of scale. A further challenge in QRM is the typical scale of the 
portfolios under consideration; in the most general case, a portfolio may represent 
the entire position in risky assets of a financial institution. Calibration of detailed 
multivariate models for all risk factors is an almost impossible task, and any sensible 
strategy must involve dimension reduction; that is to say, the identification of key 
risk drivers and a concentration on modelling the main features of the overall risk 
landscape. 

In short, we are forced to adopt a fairly broad-brush approach. Where we use 
econometric tools, such as models for financial return series, we are content with 
relatively simple descriptions of individual series that capture the main phenomenon 
of volatility, and which can be used in a parsimonious multivariate factor model. 
Similarly, in the context of portfolio credit risk, we are more concerned with finding 
suitable models for the default dependence of counterparties than with accurately 
describing the mechanism for the default of an individual, since it is our belief that 
the former is at least as important as the latter in determining the risk of a large 
diversified portfolio. 


Interdisciplinarity. Another aspect of the challenge of QRM is the fact that ideas 
and techniques from several existing quantitative disciplines are drawn together. 
When one considers the ideal education for a quantitative risk manager of the 
future, then a combined quantitative skill set should undoubtedly include concepts, 
techniques and tools from such fields as mathematical finance, statistics, financial 
econometrics, financial economics and actuarial mathematics. Our choice of topics 
is strongly guided by a firm belief that the inclusion of modern statistical and econo- 
metric techniques and a well-chosen subset of actuarial methodology are essential 
for the establishment of best-practice QRM. QRM is certainly not just about financial 
mathematics and derivative pricing, important though these may be. 
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Communication and education. Of course, the quantitative risk manager operates 
in an environment where additional non-quantitative skills are equally important. 
Communication is certainly an important skill: risk professionals, by the definition 
of their duties, will have to interact with colleagues with diverse training and back- 
grounds, at all levels of their organization. Moreover, a quantitative risk manager 
has to familiarize him or herself quickly with all-important market practice and 
institutional details. A certain degree of humility will also be required to recognize 
the role of QRM in a much larger picture. 

A lesson from the 2007-9 crisis is that improved education in QRM is essential; 
from the front office to the back office to the boardroom, the users of models and their 
outputs need to be better trained to understand model assumptions and limitations. 
This task of educating users is part of the role of a quantitative risk manager, who 
should ideally have (or develop) the pedagogical skills to explain methods and 
conclusions to audiences at different levels of mathematical sophistication. 


1.5.3. QRM Beyond Finance 


The use of QRM technology is not restricted to the financial services industry, and 
similar developments have taken place, or are taking place, in other sectors of indus- 
try. Some of the earliest applications of QRM are to be found in the manufacturing 
industry, where similar concepts and tools exist under names like reliability or total 
quality control. Industrial companies have long recognized the risks associated with 
bringing faulty products to the market. The car manufacturing industry in Japan, in 
particular, was an early driving force in this respect. 

More recently, QRM techniques have been adopted in the transport and energy 
industries, to name but two. In the case of energy, there are obvious similarities 
with financial markets: electrical power is traded on energy exchanges; derivatives 
contracts are used to hedge future price uncertainty; companies optimize investment 
portfolios combining energy products with financial products; some Basel method- 
ology can be applied to modelling risk in the energy sector. However, there are also 
important dissimilarities due to the specific nature of the industry; most importantly, 
there are the issues of the cost of storage and transport of electricity as an under- 
lying commodity, and the necessity of modelling physical networks including the 
constraints imposed by the existence of national boundaries and quasi-monopolies. 

There are also markets for environmental emission allowances. For example, 
the Chicago Climate Futures Exchange offers futures contracts on sulphur dioxide 
emissions. These are traded by industrial companies producing the pollutant in 
their manufacturing process, and they force such companies to consider the cost of 
pollution as a further risk in their risk landscape. 

A natural consequence of the evolution of QRM thinking in different industries 
is an interest in the transfer of risks between industries; this process is known as 
alternative risk transfer. To date the best examples of risk transfer are between the 
insurance and banking industries, as illustrated by the establishment of catastrophe 
futures by the Chicago Board of Trade in 1992. These came about in the wake of 
Hurricane Andrew, which caused $20 billion of insured losses on the East Coast of 
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the US. While this was a considerable event for the insurance industry in relation 
to overall reinsurance capacity, it represented only a drop in the ocean compared 
with the daily volumes traded worldwide on financial exchanges. This led to the 
recognition that losses could be covered in future by the issuance of appropriately 
structured bonds with coupon streams and principal repayments dependent on the 
occurrence or non-occurrence of well-defined natural catastrophe events, such as 
storms and earthquakes. 

A speculative view of where these developments may lead is given by Shiller 
(2003), who argues that the proliferation of risk-management thinking coupled with 
the technological sophistication of the twenty-first century will allow any agent in 
society, from a company to a country to an individual, to apply QRM methodology to 
the risks they face. In the case of an individual this may be the risk of unemployment, 
depreciation in the housing market or investment in the education of children. 


Notes and Comments 


The language of probability and statistics plays a fundamental role throughout this 
book, and readers are expected to have a good knowledge of these subjects. At 
the elementary level, Rice (1995) gives a good first introduction to both. More 
advanced texts in probability and stochastic processes are Williams (1991), Resnick 
(1992) and Rogers and Williams (1994); the full depth of these texts is certainly not 
required for the understanding of this book, though they provide excellent reading 
material for more mathematically sophisticated readers who also have an interest in 
mathematical finance. Further recommended texts on statistical inference include 
Casella and Berger (2002), Bickel and Doksum (2001), Davison (2003) and Lindsey 
(1996). 

In our discussion of risk and randomness in Section 1.1.1 we mentioned Knight 
(1921) and Keynes (1920), whose classic texts are very much worth revisiting. 
Knightian uncertainty refers to uncertainty that cannot be measured and is sometimes 
contrasted with risks that can be measured using probability. This relates to the 
more recent idea of a Black Swan event, a term popularized in Taleb (2007) but 
introduced in Taleb (2001). Black swans were believed to be imaginary creatures 
until the European exploration of Australia and the name is applied to unprecedented 
and unpredictable events that challenge conventional beliefs and models. Donald 
Rumsfeld, a former US Secretary of Defense, referred to “unknown unknowns” in a 
2002 news briefing on the evidence for the presence of weapons of mass destruction 
in Iraq. 

An excellent text on the history of risk and probability with financial applications 
in mind is Bernstein (1998). We also recommend Shiller (2012) for more on the 
societal context of financial risk management. A thought-provoking text addressing 
risk on Wall Street from a historical perspective is Brown (2012). 

For the mathematical reader looking to acquire more knowledge about the relevant 
economics we recommend Mas-Colell, Whinston and Green (1995) for microeco- 
nomics, Campbell, Lo and MacKinlay (1997) or Gouriéroux and Jasiak (2001) 
for econometrics, and Brealey and Myers (2000) for corporate finance. From the 


40 1. Risk in Perspective 


vast literature on options, an entry-level text for the general reader is Hull (2014). 
At a more mathematical level we like Bingham and Kiesel (2004), Musiela and 
Rutkowski (1997), Shreve (2004a) and Shreve (2004b). One of the most readable 
texts on the basic notion of options is Cox and Rubinstein (1985). For a rather exten- 
sive list of the kind of animals to be found in the zoological garden of derivatives, 
see, for example, Haug (1998). 

There are several texts on the spectacular losses that occurred as the result of 
speculative trading and the careless use of derivatives. For a historical overview of 
financial crises, see Reinhart and Rogoff (2009), as well as the much earlier Galbraith 
(1993) and Kindleberger (2000). Several texts exist on more recent crises; we list 
only afew. The LTCM case is well documented in Dunbar (2000), Lowenstein (2000) 
and Jorion (2000), the latter particularly focusing on the technical risk-measurement 
issues involved. Boyle and Boyle (2001) give a very readable account of the Orange 
County, Barings and LTCM stories (see also Jacque 2010). For the Equitable Life 
case see the original Penrose Report, published by the UK government (Lord Penrose 
2004), or an interesting paper by Roberts (2012). Many books have emerged on the 
2007-9 crisis; early warnings are well summarized, under Greenspan’s memorable 
“irrational exuberance” phrase, in a pre-crisis book by Shiller (2000), and the post- 
mortem by the same author is also recommended (Shiller 2008). 

An overview of options embedded in life insurance products is given in Dillmann 
(2002), guarantees are discussed in detail in Hardy (2003), and Briys and de Varenne 
(2001) contains an excellent account of risk-management issues facing the (life) 
insurance industry. For risk-management and valuation issues underlying life insur- 
ance, see Koller (2011) and Møller and Steffensen (2007). Market-consistent actu- 
arial valuation is discussed in Wuthrich, Buhlmann and Furrer (2010). 

The historical development of banking regulation is well described in Crouhy, 
Galai and Mark (2001) and Steinherr (1998). For details of the current rules and 
regulations coming from the Basel Committee, see its website at www.bis.org/bebs. 
Besides copies of the various accords, one can also find useful working papers, publi- 
cations and comments written by stakeholders on the various consultative packages. 
For Solvency II and the Swiss Solvency Test, many documents are to be found on 
the web. Comprehensive textbook accounts are Sandstrom (2006) and Sandstrom 
(2011), and a more technical treatment is found in Wuthrich and Merz (2013). The 
complexity of risk-management methodology in the wake of Basel II is critically 
addressed by Hawke (2003), from his perspective as US Comptroller of the Cur- 
rency. Among the numerous texts written after the 2007-9 crisis, we found all of 
Rochet (2008), Shin (2010), Dewatripont, Rochet and Tirole (2010) and Bénéplanc 
and Rochet (2011) useful. For a discussion of issues related to the use of fair-value 
accounting during the financial crisis, see Ryan (2008). 

For a very detailed overview of relevant practical issues underlying risk man- 
agement, we again strongly recommend Crouhy, Galai and Mark (2001). A text 
stressing the use of VaR as a risk measure and containing several worked examples 
is Jorion (2007), whose author also has a useful teaching manual on the same subject 
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(Jorion 2002b). Insurance-related issues in risk management are nicely presented in 
Doherty (2000). 

For a comprehensive discussion of the management of bank capital given regu- 
latory constraints, see Matten (2000), Klaassen and van Eeghen (2009) and Admati 
and Hellwig (2013). Graham and Rogers (2002) contains a discussion of risk man- 
agement and tax incentives. A formal account of the Modigliani—Miller Theorem 
and its implications can be found in many textbooks on corporate finance: a standard 
reference is Brealey and Myers (2000), and de Matos (2001) gives a more theoretical 
account from the perspective of modern financial economics. Both texts also discuss 
the implications of informational asymmetries between the various stakeholders in 
a corporation. Formal models looking at risk management from a corporate-finance 
angle are to be found in Froot and Stein (1998), Froot, Scharfstein and Stein (1993) 
and Stulz (1996, 2002). For a specific discussion on corporate-finance issues in 
insurance, see Froot (2007) and Hancock, Huber and Koch (2001). 

There are several studies on the use of risk-management techniques for non- 
financial firms (see, for example, Bodnar, Hayt and Marston 1998; Geman 2005, 
2009). Two references in the area of the reliability of industrial processes are Bedford 
and Cooke (2001) and Does, Roes and Trip (1999). Interesting edited volumes on 
alternative risk transfer are Shimpi (2001), Barrieu and Albertini (2009) and Kiesel, 
Scherer and Zagst (2010); a detailed study of model risk in the alternative risk transfer 
context is Schmock (1999). An area we have not mentioned so far in our discussion 
of QRM in the future is that of real options. A real option is the right, but not the 
obligation, to take an action (e.g. deferring, expanding, contracting or abandoning) 
at a predetermined cost called the exercise price. The right holds for a predetermined 
period of time—the life of the option. This definition is taken from Copeland and 
Antikarov (2001). Examples of real options discussed in the latter are the valuation 
of an internet project and of a pharmaceutical research and development project. A 
further useful reference is Brennan and Trigeorgis (1999). 

A well-written critical view of the failings of the standard approach to risk man- 
agement is given in Rebonato (2007). And finally, for an entertaining text on the 
biology of the much criticized “homo economicus”, we like Coates (2012). 
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Basic Concepts in Risk Management 


In this chapter we define or explain a number of fundamental concepts used in the 
measurement and management of financial risk. Beginning in Section 2.1 with the 
simplified balance sheet of a bank and an insurer, we discuss the risks faced by 
such firms, the nature of capital, and the need for a firm to have sufficient capital to 
withstand financial shocks and remain solvent. 

In Section 2.2 we establish a mathematical framework for describing changes 
in the value of portfolios and deriving loss distributions. We provide a number of 
examples to show how this framework applies to different kinds of asset and liability 
portfolios. The examples are also used to discuss the meaning of value in more detail 
with reference to fair-value accounting and risk-neutral valuation. 

Section 2.3 is devoted to the subject of using risk measures to determine risk or 
solvency capital. We present different quantitative approaches to measuring risk, 
with a particular focus on risk measures that are calculated from loss distributions, 
like value-at-risk and expected shortfall. 


2.1 Risk Management for a Financial Firm 
2.1.1 Assets, Liabilities and the Balance Sheet 


A good way to understand the risks faced by a modern financial institution is to look 
at the stylized balance sheet of a typical bank or insurance company. A balance sheet 
is a financial statement showing assets and liabilities; roughly speaking, the assets 
describe the financial institution’s investments, whereas liabilities refer to the way in 
which funds have been raised and the obligations that ensue from that fundraising. 

A typical bank raises funds by taking in customer deposits, by issuing bonds and 
by borrowing from other banks or from central banks. Collectively these form the 
debt capital of the bank, which is invested in a number of ways. Most importantly, 
it is used for loans to retail, corporate and sovereign customers, invested in traded 
securities, lent out to other banks or invested in property or in other companies. A 
small fraction is also held as cash. 

A typical insurance company sells insurance contracts, collecting premiums in 
return, and raises additional funds by issuing bonds. The liabilities of an insurance 
company thus consist of its obligations to policyholders, which take the form of 
a technical reserve against future claims, and its obligations to bondholders. The 
funds raised are then invested in traded securities, particularly bonds, as well as 
other assets such as real estate. 
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Table 2.1. The stylized balance sheet of a typical bank. 


Bank ABC (31 December 2015) 


Assets Liabilities 
Cash £10M Customer deposits £80M 
(and central bank balance) 
Securities £50M Bonds issued 
— bonds — senior bond issues £25M 
— stocks — subordinated bond issues £15M 
— derivatives Short-term borrowing £30M 
Loans and mortgages £100M Reserves (for losses on loans) £20M 
— corporates 
— retail and smaller clients Debt (sum of above) £170M 
— government 
Other assets £20M 
— property 
— investments in companies Equity £30M 
Short-term lending £20M 
Total £200M Total £200M 


In both cases a small amount of extra funding stems from occasional share issues, 
which form the share capital of the bank or insurer. This form of funding is crucial 
as it entails no obligation towards outside parties. 

These simplified banking and insurance business models are reflected in the styl- 
ized balance sheets shown in Tables 2.1 and 2.2. In these financial statements, assets 
and liabilities are valued on a given date. The position marked equity on the liability 
side of the balance sheet is the residual value defined in the balance sheet equation 


value of assets = value of liabilities = debt + equity. (2.1) 


Acompany is solvent at a given point in time if the equity is nonnegative; otherwise 
it is insolvent. Insolvency should be distinguished from the notion of default, which 
occurs if a firm misses a payment to its debtholders or other creditors. In particular, an 
otherwise-solvent company can default because of liquidity problems, as discussed 
in more detail in the next section. 

It should be noted that assigning values to the items on the balance sheet of a 
bank or insurance company is a non-trivial task. Broadly speaking, two different 
approaches can be distinguished. The practice of fair-value accounting attempts 
to value assets at the prices that would be received if they were sold and to value 
liabilities at the prices that would have to be paid if they were transferred to another 
party. Fair-value accounting is relatively straightforward for positions that are close 
to securities traded on liquid markets, since these are simply valued by (an estimate 
of) their market price. It is more challenging to apply fair-value principles to non- 
traded or illiquid assets and liabilities. 

The more traditional practice of amortized cost accounting is still applied to many 
kinds of financial asset and liability. Under this practice the position is assigned a 
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Table 2.2. The stylized balance sheet of a typical insurer. 


Insurer XYZ (31 December 2015) 


Assets Liabilities 

Investments Reserves for policies written £80M 

— bonds £50M (technical provisions) 

— stocks £5M Bonds issued £10M 

— real estate £5M 
Investments for £30M Debt (sum of above) £90M 
unit-linked contracts 
Other assets £10M 

— property 

Equity £10M 

Total £100M Total £100M 


book value at its inception and this is carried forward over time. In some cases the 
value is progressively reduced or impaired to account for the aging of the position or 
the effect of adverse events. An example of assets valued at book value are the loans 
on the balance sheet of the bank. The book value would typically be an estimate of 
the present value (at the time the loans were made) of promised future interest and 
principal payments minus a provision for losses due to default. 

In the European insurance industry the practice of market-consistent valuation has 
been promoted under the Solvency II framework. As described in Section 1.3.2, the 
rationale is very similar to that of fair-value accounting: namely, to value positions 
by “the amount for which an asset could be exchanged or a liability settled, between 
knowledgeable, willing parties in an arm’s length transaction, based on observable 
prices within an active, deep and liquid market”. However, there are some differences 
between market-consistent valuation and fair-value accounting for specific kinds of 
position. A European insurer will typically have two versions of the balance sheet 
in order to comply with accounting rules, on the one hand, and Solvency II rules 
for capital adequacy, on the other. The accounting balance sheet may mix fair-value 
and book-value approaches, but the Solvency II balance sheet will apply market- 
consistent principles throughout. 

Overall, across the financial industry, there is a tendency for the accounting stan- 
dard to move towards fair-value accounting, even if the financial crisis of 2007-9 
demonstrated that this approach is not without problems during periods when trading 
activity and market liquidity suddenly vanish (see Section 1.3.3 for more discussion 
of this issue). Fair-value accounting for financial products will be discussed in more 
detail in Section 2.2.2. 


2.1.2 Risks Faced by a Financial Firm 


An obvious source of risk for a bank is a decrease in the value of its investments on 
the asset side of the balance sheet. This includes market risk, such as losses from 
securities trading, and credit risk. Another important risk is related to funding and 
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so-called maturity mismatch: for a typical bank, large parts of the asset side consist of 
relatively illiquid, long-term investments such as loans or property, whereas a large 
part of the liabilities side consists of short-term obligations such as funds borrowed 
from money markets and most customer deposits. This may lead to problems when 
the cost of short-term refinancing increases due to rising short-term interest rates, 
because the banks may have difficulties selling long-term assets to raise funds. This 
can lead to the default of a bank that is technically solvent; in extreme cases there 
might even be a bank run, as was witnessed during the 2007-9 financial crisis. This 
clearly shows that risk is found on both sides of the balance sheet and that risk 
managers should not focus exclusively on the asset side. 

The primary risk for an insurance company is clearly insolvency, i.e. the risk that 
the claims of policyholders cannot be met. This can happen due to adverse events 
affecting the asset side or the liability side of the balance sheet. On the asset side, 
the risks are similar to those for a bank. On the liability side, the main risk is that 
reserves are insufficient to cover future claim payments. It is important to bear in 
mind that the liabilities of a life insurer are of a long-term nature (due to the sale 
of products such as annuities) and are subject to many categories of risk including 
interest-rate risk, inflation risk and longevity risk, some of which also affect the asset 
side. An important aspect of the risk-management strategy of an insurance company 
is, therefore, to hedge parts of these risks by proper investment of the premium 
income (so-called liability-driven investment). 

It should be clear from this discussion that a sound approach to risk management 
cannot look at one side of the balance sheet in isolation from the other. 


2.1.3 Capital 


There are many different notions of bank capital, and three broad concepts can be 
distinguished: equity (or book) capital, regulatory capital and economic capital. All 
of these notions of capital refer to items on the liability side of the balance sheet that 
entail no (or very limited) obligations to outside creditors and that can thus serve as 
a buffer against losses. 

The equity capital can be read from the balance sheet according to the balance 
sheet equation in (2.1). It is therefore a measure of the value of the company to 
the shareholders. The balance sheet usually gives a more detailed breakdown of the 
equity capital by listing separate positions for shareholder capital, retained earnings 
and other items of lesser importance. Shareholder capital is the initial capital invested 
in the company by purchasers of equity. For companies financed by a single share 
issue, this is given by the numbers of shares issued multiplied by their price at the 
issuance date. Shareholder capital is therefore different from market capitalization, 
which is given by the number of shares issued multiplied by their current market 
price. Retained earnings are the accumulated earnings that have not been paid out 
in the form of dividends to shareholders; these can in principle be negative if the 
company has made losses. 

Regulatory capital is the amount of capital that a company should have accord- 
ing to regulatory rules. For a bank, the rules are set out in the Basel framework, 
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as described in more detail in Section 1.3.1. For European insurance companies, 
regulatory capital takes the form of a minimum capital requirement and a solvency 
capital requirement as set out in the Solvency II framework (see Section 1.3.2). 

Aregulatory capital framework generally specifies the amount of capital necessary 
for a financial institution to continue its operations, taking into account the size and 
the riskiness of its positions. Moreover, it specifies the quality of the capital and 
hence the form it should take on the balance sheet. In this context one usually 
distinguishes between different numbered capital tiers. 

For example, in the Basel framework, Tier 1 capital is the sum of shareholder 
capital and retained earnings; in other words, the main constituents of the equity 
capital. This capital can act in full as a buffer against losses as there are no other 
claims on it. Tier 2 capital includes other positions of the balance sheet, in particular 
subordinated debt. Holders of this debt would effectively be the last to be paid before 
the shareholders in the event of the liquidation of the company, so subordinated 
debt can be viewed as an extra layer of protection for depositors and other senior 
debtholders. For illustration, the bank in Table 2.1 has Tier 1 capital of £30 million 
(assuming the equity capital consists of shareholder capital and retained earnings 
only) and Tier 2 capital of £45 million. 

Economic capital is an estimate of the amount of capital that a financial institution 
needs in order to control the probability of becoming insolvent, typically over a one- 
year horizon. It is an internal assessment of risk capital that is guided by economic 
modelling principles. In particular, an economic capital framework attempts to take 
a holistic view that looks at assets and liabilities simultaneously, and works, where 
possible, with fair or market-consistent values of balance sheet items. Although, his- 
torically, regulatory capital frameworks have been based more on relatively simple 
rules and on book values for balance sheet items, there is increasing convergence 
between the economic and regulatory capital concepts, particularly in the insurance 
world, where Solvency II emphasizes market-consistent valuation of liabilities. 

Note that the various notions of capital refer to the way in which a financial firm 
finances itself and not to the assets it invests in. In particular, capital requirements 
do not require the setting aside of funds that cannot be invested productively, e.g. by 
issuing new loans. There are other forms of financial regulation that refer to the asset 
side of the balance sheet and restrict the investment possibilities, such as obligatory 
cash reserves for banks and constraints on the proportion of insurance assets that 
may be invested in stocks. 


Notes and Comments 


A good introduction to the business of banking and the risks affecting banks is 
Choudhry (2012), while Thoyts (2010) provides a very readable overview of theory 
and practice in the insurance industry, with a focus on the UK. Readers wanting to go 
deeper into the subject of balance sheets have many financial accounting textbooks 
to choose from, a popular one being Elliott and Elliott (2013). A paper that gives 
more explanation of fair-value accounting and also discusses issues raised by the 
financial crisis is Ryan (2008). 
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Regulatory capital in the banking industry is covered in many of the documents 
produced by the Basel Committee, in particular the papers covering the Basel II 
and Basel III capital frameworks (Basel Committee on Banking Supervision 2006, 
2011). For regulatory capital under Solvency II, see Sandstrom (2011). Textbook 
treatments of the management of bank capital given regulatory constraints are found 
in Matten (2000) and Klaassen and van Eeghen (2009), while Admati et al. (2013) 
provides a strong argument for capital regulation that ensures banks have a high level 
of equity capital. This issue is discussed at a slightly less technical level in the book by 
Admati and Hellwig (2013). A good explanation of the concept of economic capital 
may be found in the relevant entry in the Encyclopedia of Quantitative Finance 
(Rosen and Saunders 2010). 


2.2 Modelling Value and Value Change 


We have seen in Section 2.1.1 that an analysis of the risks faced by a financial 
institution requires us to consider the change in the value of its assets and liabilities. 
In Section 2.2.1 we set up a formal framework for modelling value and value change 
and illustrate this framework with stylized asset and liability portfolios. With the 
help of these examples we take a closer look at valuation methods in Section 2.2.2. 
Finally, in Section 2.2.3 we discuss the different approaches that are used to construct 
loss distributions for portfolios over given time horizons. 


2.2.1 Mapping Risks 


In our general mathematical model for describing financial risks we represent the 
uncertainty about future states of the world by a probability space (2, F, P), which 
is the domain of all random variables (rvs) we introduce below. 

We consider a given portfolio of assets and, in some cases, liabilities. At the 
simplest level, this could be a collection of stocks or bonds, a book of derivatives or 
a collection of risky loans. More generally, it could be a portfolio of life insurance 
contracts (liabilities) backed by investments in securities such as bonds, or even a 
financial institution’s overall balance sheet. We denote the value of the portfolio 
at time t by V, and assume that the rv V; is known, or can be determined from 
information available, at time t. Of course, the valuation of many positions on a 
financial firm’s balance sheet is a challenging task; we return to this issue in more 
detail in Section 2.2.2. 

We consider a given risk-management time horizon At, which might be one day 
or ten days in market risk, or one year in credit, insurance or enterprise-wide risk 
management. To develop a simple formalism for talking about value, value change 
and the role of risk factors, we will make two simplifying assumptions: 


e the portfolio composition remains fixed over the time horizon; and 


e there are no intermediate payments of income during the time period. 


While these assumptions may hold approximately for a one-day or ten-day horizon, 
they are unlikely to hold over one year, where items in the portfolio may mature 
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and be replaced by other investments and where dividend or interest income may 
accumulate. In specific situations it would be possible to relax these assumptions, 
e.g. by specifying simple rebalancing rules for portfolios or by taking intermediate 
income into account. 

Using a time-series notation (with time recorded in multiples of the time horizon 
At) we write the value of the portfolio at the end of the time period as V;+ı and 
the change in value of the portfolio as AV;;; = V;41; — Vr. We define the loss 
to be L741 := —AV;41, which is natural for short time intervals. For longer time 
intervals, on the other hand, this definition neglects the time value of money, and an 
alternative would be to define the loss to be V; — Vi41/(1 + 77,1), where r, is the 
simple risk-free interest rate that applies between times t and t + 1; this measures 
the loss in units of money at time t. The rv L;+ı is typically random from the 
viewpoint of time ż, and its distribution is termed the loss distribution. Practitioners 
in risk management are often concerned with the so-called P&L distribution. This 
is the distribution of the change in portfolio value A V;+1. In this text we will often 
focus on L;+ as this simplifies the application of many statistical methods and is 
in keeping with conventions in actuarial risk theory. 

The value V; is typically modelled as a function of time and a d-dimensional 
random vector Z; = (Z;,1,..., Zt, ay of risk factors, i.e. we have the representation 


Vi = f(t, Zi) (2.2) 


for some measurable function f : Ry x R? — R. Risk factors are usually assumed 
to be observable, so the random vector Z; takes some known realized value z; at 
time ¢ and the portfolio value V, has realized value f(t, z+). The choice of the risk 
factors and of f is of course a modelling issue and depends on the portfolio at hand, 
on the data available and on the desired level of precision (see also Section 2.2.2). A 
representation of the portfolio value in the form (2.2) is termed a mapping of risks. 
Some examples of the mapping procedure are provided below. 

We define the random vector of risk-factor changes over the time horizon to be 


Xr41 := Z41 — Z. Assuming that the current time is ¢ and using the mapping 
(2.2), the portfolio loss is given by 
Li+ = (fE + 1, zt + Xe41) — f(t, 2), (2.3) 


which shows that the loss distribution is determined by the distribution of the risk- 
factor change X;+1. 

If f is differentiable, we may also use a first-order approximation Eai of the 
loss in (2.3) of the form 


d 
Lra = -(flt.20 + falt. DX) (2.4) 
i=l 


where the subscripts on f denote partial derivatives. The notation L“ stems from the 
standard delta terminology in the hedging of derivatives (see Example 2.2 below). 
The first-order approximation is convenient as it allows us to represent the loss as 
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a linear function of the risk-factor changes. The quality of the approximation (2.4) 
is obviously best if the risk-factor changes are likely to be small (i.e. if we are 
measuring risk over a short horizon) and if the portfolio value is almost linear in the 
risk factors (i.e. if the function f has small second derivatives). 

We now consider a number of examples from the areas of market, credit and insur- 
ance risk, illustrating how typical risk-management problems fit into this framework. 


Example 2.1 (stock portfolio). Consider a fixed portfolio of d stocks and denote by 
Ài the number of shares of stock i in the portfolio at time t. The price process of stock i 
is denoted by (Sz, i)ren. Following standard practice in finance and risk management 
we use logarithmic prices as risk factors, i.e. we take Z; ; := In S;;, 1 < i < d, and 
we get V; = an Aje?%i , The risk-factor changes X;41,; = In S;41,; — 1n S;; then 
correspond to the log-returns of the stocks in the portfolio. The portfolio loss from 
time ¢ tot + 1 is given by 


d 
Lit41 = (Vi44 V) = XO iSi (6% — 1), 


i=l 


and the linearized loss L a is given by 
d d 
La =— aS Xe =V; Y Wi Xi, (2.5) 
i=l i=l 
where the weight wr; := (A; S:,;)/V; gives the proportion of the portfolio value 


invested in stock i at time t. Given the mean vector and covariance matrix of the dis- 
tribution of the risk-factor changes, it is very easy to compute the first two moments 
of the distribution of the linearized loss L“. Suppose that the random vector X,+1 
has a distribution with mean vector u and covariance matrix X. Using general 
rules for the mean and variance of linear combinations of a random vector (see also 
equations (6.7) and (6.8)), we immediately get 


E(LA) =-—V,w'p and var(LÂ 1) = Vw Xw. (2.6) 


Example 2.2 (European call option). We now consider a simple example of a port- 
folio of derivative securities: namely, a standard European call on a non-dividend- 
paying stock with maturity time T and exercise price K. We use the Black-Scholes 
option-pricing formula for the valuation of our portfolio. The value of a call option 
on a stock with price S at time t is given by 


CPt, S:r,0, K, T) := SO (d1) — Ke" (ad), (2:9) 


where ® denotes the standard normal distribution function (df), r represents the 
continuously compounded risk-free interest rate, o denotes the volatility of the 
underlying stock, and where 


Pya In(S/K) + (r + 50°)(T — t) 


and d =d —oẸĒļT —-t. 2.8 
1 ear 2 1 Vv (2.8) 
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For notational simplicity we assume that the time to maturity of the option, T — t, is 
measured in units of the time horizon, and that the parameters r and o are expressed 
in terms of those units; for example, if the time horizon is one day, then r and o 
are the daily interest rate and volatility. This differs from standard market practice 
where time is measured in years and r and o are expressed in annualized terms. 

To map the portfolio at time ż, let S; denote the stock price at time ¢ and let r; and 
or denote the values that a practitioner chooses to use at that time for the interest 
rate and volatility. The log-price of the stock (In $+) is an obvious risk factor for 
changes in value of the portfolio. While in the Black-Scholes option-pricing model 
the interest rate and volatility are assumed to be constant, in real markets interest 
rates change constantly, as do the implied volatilities that practitioners tend to use as 
inputs for the volatility parameter. Hence, we take Z; = (In S;, r+, 07)’ as the vector 
of risk factors. 

According to the Black-Scholes formula the value of the call option at time t 
equals CBS (t, Si; ri, or, K, T), which is of the form (2.2). The risk-factor changes 
are given by 


Xai = (in S44 — In Sp, 7441 — rt, 0141 — or)’, 
and the linearized loss can be calculated to be 
LA, = (CPS + CBS 5, X 41,1 + CPS X i412 + CP X413), (2.9) 


where the subscripts denote partial derivatives of the Black-Scholes formula (2.7). 
Note that we have omitted the arguments of CBS to simplify the notation. Note also 
that an S; term appears because we take the equity risk factor to be the log-price of 
the stock rather than the price; applying the chain rule with S = e?! we have 


In Section 9.1.2 and Example 9.1 we give more detail concerning the derivation of 
mapping formulas similar to (2.9) and pay more attention to the choice of timescale 
in the mapping function. 

The derivatives of the Black-Scholes option-pricing function are often referred to 
as the Greeks: C Ba (the partial derivative with respect to the stock price S) is called 
the delta of the option; C BS (the partial derivative with respect to time) is called the 
theta of the option; GPS (the partial derivative with respect to the interest rate r) is 
called the rho of the option; and, in a slight abuse of the Greek language, CBS (the 
partial derivative with respect to volatility ø) is called the vega of the option. The 
Greeks play an important role in the risk management of derivative portfolios. 

The reader should keep in mind that for portfolios containing derivatives, the lin- 
earized loss can be a rather poor approximation of the true loss, since the portfolio 
value is often a highly nonlinear function of the risk factors. This has led to the devel- 
opment of higher-order approximations such as the delta-gamma approximation, 
where first- and second-order derivatives are used (see Section 9.1.2). 
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Example 2.3 (stylized loan portfolio). In this example we show how losses from 
a portfolio of short-term loans fit into our general framework; a detailed discussion 
of models for loan portfolios will be presented in Chapter 11. 

Following standard practice in credit risk management, the risk-management hori- 
zon At is taken to be one year. We consider a portfolio of loans to m different 
borrowers or obligors that have been made at time ¢ and valued using a book-value 
approach. To keep the example simple we assume that all loans have to be repaid at 
time t + 1. We denote the amount to be repaid by obligor i by ki; this term comprises 
the interest payment at t + 1 and the repayment of the loan principal. The exposure 
to obligor i is defined to be the present value of the promised interest and principal 
cash flows, and it is therefore given by e; = k;/(1 + 1,1). 

In order to take the possibility of default into account we introduce a series of 
random variables (Y, ;)rey that represent the default state of obligor i at t, and we let 
Y, i = 11if obligor i has defaulted by time t, with Y, ; = 0 otherwise. These variables 
are known as default indicators. For simplicity we assume that all obligors are in a 
non-default state at time t, so Y;; = 0 forall 1 <i < m. 

In keeping with valuation conventions, in practice we define the book value of 
a loan to be the exposure of the loan reduced by the discounted expected loss 
due to default; in this way the valuation includes a provision for default risk. We 
assume that in the case of a default of borrower i, the lender can recover an amount 
(1 — 6;)k; at the maturity date t + 1, where ô; € (0, 1] describes the so-called loss 
given default of the loan, which is the percentage of the exposure that is lost in the 
event of default. Moreover, we denote by p; the probability that obligor i defaults 
in the period (t, t + 1]. In this introductory example we suppose that ô; and p; are 
known constants. In practice, p; could be estimated using a credit scoring model 
(see Section 10.2 for more discussion). The discounted expected loss due to a default 
of obligor i is thus given by 


1 
1+r41 


ôi piki = ôi piei. 


The book value of loan i is therefore equal to e; (1 — 4; pi), the discounted expected 
pay-off of the loan. Note that in practice, one would make further provisions for 
administrative, refinancing and capital costs, but we ignore these issues for the sake 
of simplicity. Moreover, one should keep in mind that the book value is not an 
estimate for the fair value of the loan (an estimate for the amount for which the loan 
could be sold in a securitization deal); the latter is usually lower than the discounted 
expected pay-off of the loan, as investors demand a premium for bearing the default 
risk (see also our discussion of risk-neutral valuation in the next section). The book 
value of the loan portfolio at time ¢ is thus given by 


m 


Vi = > ei(1 — ŝipi). 


i=l 
The value of a loan to obligor i at the maturity date t + 1 equals the size of the 
repayment and is therefore equal to k; if Y;+1,; = 0 (no default of obligor i) and 
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equal to (1 — 6;)k; if Y;41,; = 1 (default of obligor i). Hence, V;+1, the value of the 
portfolio at time t + 1, equals 


m m 
Viet = ÈO (0 = Yea aki + Yigal — 8i)ki) = D0 i - bi Yrga). 
i=l i=1 
Since we use a relatively long risk-management horizon of one year, it is natural 
to discount V;,1 in computing the portfolio loss. Again using the fact that e; = 
ki/(1 + 171,1), we obtain 
m m 
ba Pad- sip- Y al- 8: ¥e418) 


i=l i=l 
m m 

= Sie: Yi41.i = Ss di; Pi, 
j i=1 


which gives a simple formula for the portfolio loss involving exposures, default 
probabilities, losses given default and default indicators. 

Finally, we explain how this example fits into the mapping framework given 
by (2.2) and (2.3). In this case the risk factors are the default indicator variables 


Zi = (Yi1, ---, Yim)’. If we write the mapping formula as 
m k: m 
L 
f6 Z=) 0 ‘pee yee OF 5)8; pi) + >> Ysiki (1 — 8), 


i=l i=1 
we see that this gives the correct portfolio values at times s = t ands = t + 1. 
The issue of finding and calibrating a good model for the joint distribution of the 


risk-factor changes Z;+1 — Zz is taken up in Chapter 11. 


Example 2.4 (insurance example). We consider a simple whole-life annuity prod- 
uct in which a policyholder (known as an annuitant) has purchased the right to 
receive a series of payments as long as he or she remains alive. Although realistic 
products would typically make monthly payments, we assume annual payments for 
simplicity and consider a risk-management horizon of one year. 

At time t we assume that an insurer has a portfolio of n annuitants with current 
ages xj, i = 1,...,n. Annuitant i receives a fixed annual annuity payment «Ki, 
and the time of their death is represented by the random variable t;. The annuity 
payments are made in arrears at times t+ 1, t + 2, . . . , a form of product known as 
a whole-life immediate annuity. 

At time ¢ there is uncertainty about the value of the cash flow to any individual 
annuitant stemming from the uncertainty about their time of death. The liability due 
to a single annuitant takes the form 

[0.0] 
Yo ln>rnki D(C, t +h), 
h=1 
where D(t, t +h) is a discount factor that gives the time-t value of one unit paid out 
at time t +h. Following standard discrete-time actuarial practice we set D(t, t+h) = 
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(drr, p” , where r;,;, is the h-year simple spot interest rate at time t. The expected 
present value of this liability at time ¢ is given by 


Z 1 
2 ai hki 
where we assume that P(t; > t + h) = q;(x;,h). In other words, the survival 
probability of annuitant i depends on only the current time f and the age x; of the 
annuitant; g;(x, h) represents the probability that an individual aged x at time t will 
survive a further h years. 

If n is sufficiently large, diversification arguments suggest that individual mor- 
tality risk (deviation of the variables t,,..., Tn from their expected values) may be 
neglected, and the overall portfolio liability may be represented by 


n CO 1 
B, = SS GG: MAE 


i=l h=1 
Now consider the liability due to a single annuitant at time t + 1, which is given 
by 
[0.0] 
ve lu;>t+1+h}Ki 
h=1 


1 
Era) 


We again use the large-portfolio diversification argument to replace Itr; >1+1+7} by 
its expected value qg;(x;, h + 1), and thus we approximate the portfolio liability at 
t+1 by 


n [0.0] 
1 
Bri 2 3 qi, b+ De 
The lump-sum premium payments of the annuitants would typically be invested 
in a matching portfolio of bonds: that is, a portfolio chosen so that the cash flows 
from the bonds closely match the cash flows due to the policyholders. We assume 
that the investments have been made in (default-free) government bonds with d 
different maturities (all greater than or equal to one year) so that the asset value at 


time ż is 
d 


Aj 
i 3 (Hra) 
where h; is the maturity of the jth bond and å j is the number of such bonds that 
have been purchased. The net asset value of the portfolio at time ¢ is given by 
V, = At — By. 

This is a situation in which it would be natural to discount future asset and liability 
values back to time f, so that the loss (in units of time-t money) would be given by 


Art Bry 
gla | eK aio S), 
ae (4 +(e ' 


The risk factors in this example are the spot rates Z; = (11,1, -< -, Ft,m)', where m 
represents the maximum time horizon at which an annuity payment might have to 
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be made. The mortality risk in the lifetime variables T1, ..., Tm is eliminated from 
consideration by using the life table of fixed survival probabilities {q;(x, h)}. 


2.2.2 Valuation Methods 


We now take a closer look at valuation principles in the light of the stylized examples 
of the previous section. While the loan portfolio of Example 2.3 would typically 
be valued using a book-value approach, as indicated, the stock portfolio (Exam- 
ple 2.1), the European call option (Example 2.2) and the asset-backed annuity port- 
folio (Example 2.4) would all be valued using a fair-value approach in practice. In 
this section we elaborate on the different methods used in fair-value accounting and 
explain how risk-neutral valuation may be understood as a special case of fair-value 
accounting. 

We recall from Section 2.1.1 that the use of fair-value methodology for the assets 
and liabilities of an insurer is closely related to the concept of market-consistent 
valuation. The main practical difference is that the fair-value approach is applied 
to the accounting balance sheet for reporting purposes, whereas market-consistent 
valuation is applied to the Solvency II balance sheet for capital adequacy purposes. 
While there are differences in detail between the two rule books, it is sufficient for our 
purposes to view market-consistent valuation as a variant of fair-value accounting. 


Fair-value accounting. In general terms, the fair value of an asset is an estimate 
of the price that would be received in selling the asset in a transaction on an active 
market. Similarly, the fair value of a liability is an estimate of the price that would 
have to be paid to transfer the liability to another party in a market-based transaction; 
this is sometimes referred to as the exit value. 

Only a minority of balance sheet positions are traded directly in an active market. 
Accountants have therefore developed a three-stage hierarchy of fair-value account- 
ing methods, extending fair-value accounting to non-traded items. This hierarchy, 
which is codified in the US as Financial Accounting Standard 157 and worldwide 
in the 2009 amendment to International Financial Reporting Standard 7, has the 
following levels. 


Level 1: the fair value of an instrument is determined from quoted prices in an 
active market for the same instrument, without modification or repackaging. 


Level 2: the fair value of an instrument is determined using quoted prices in active 
markets for similar (but not identical) instruments or by the use of valuation 
techniques, such as pricing models for derivatives, for which all significant inputs 
are based on observable market data. 


Level 3: the fair value of an instrument is estimated using a valuation technique 
(pricing model) for which some key inputs are not observable market data (or 
otherwise publicly observable quantities). 


In risk-management language these levels are sometimes described as mark-to- 
market, mark-to-model with objective inputs, and mark-to-model with subjective 
inputs. 
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The stock portfolio in Example 2.1 is a clear example of Level 1 valuation: the 
portfolio value is determined by simply looking up current market prices of the 
stocks. 

Now consider the European call option of Example 2.2 and assume that the 
option is not traded on the market, perhaps because of a non-standard strike price 
or maturity, but that there is an otherwise active market for options on that stock. 
This would be an example of Level 2 valuation: a valuation technique (namely, the 
Black-Scholes option-pricing formula) is used to price the instrument. The inputs 
to the formula are the stock price, the interest rate and the implied volatility, which 
are market observables (since we assumed that there is an active market for options 
on the stock). 

The insurance portfolio from Example 2.4 can be viewed as an example of Level 2 
or Level 3 valuation, depending on the methods used to determine the input parame- 
ters. If the survival probabilities are determined from publicly available sources such 
as official life tables, the annuity example corresponds to Level 2 valuation since 
the other risk factors are essentially market observables, with the possible exception 
of long-term interest rates. If, on the other hand, proprietary data and methods are 
used to estimate the survival probabilities, then this would be Level 3 valuation. 


Risk-neutral valuation. Risk-neutral valuation is a special case of fair-value 
accounting that is widely used in the pricing of financial products such as derivative 
securities. In risk-neutral pricing the values of financial instruments are computed 
as expected discounted values of future cash flows, where expectation is taken with 
respect to some probability measure Q, called a risk-neutral pricing measure. Q is 
an artificial measure that turns the discounted prices of traded securities into so- 
called martingales (fair bets), and it is also known as an equivalent martingale 
measure. Calibration procedures are used to ensure that prices obtained in this way 
are consistent with quoted market prices. 

Hitherto, all our probabilities and expectations have been taken with respect to the 
physical or real-world measure P.. In order to explain the concept of a risk-neutral 
measure Q and to illustrate the relationship between P and Q, we use a simple 
one-period model from the field of credit risk, which we refer to as the basic one- 
period default model. We consider a defaultable zero-coupon bond with maturity 
T equal to one year and make the following assumptions: the real-world default 
probability is p = 1%; the recovery rate 1 — ô (the proportion of the notional of 
the bond that is paid back in the case of a default) is deterministic and is equal 
to 60%; the risk-free simple interest rate equals 5%; the current (t = 0) price of 
the bond is p;(0, 1) = 0.941; the price of the corresponding default-free bond 
is po(0, 1) = (1.05)~! = 0.952. The price evolution of the bond is depicted in 
Figure 2.1. 

The expected discounted value of the bond equals (1.05)~!(0.99-1+0.01-0.6) = 
0.949 > p1(0, 1). We see that in this example the price p;(0, 1) is smaller than the 
expected discounted value of the claim. This is the typical situation in real markets 
for corporate bonds, as investors demand a premium for bearing the default risk of 
the bond. 
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1.0 (no default) 
0.941 


0.6 (default, recovery = 60%) 


Figure 2.1. Evolution of the price pı(-, 1) of a defaultable bond in the basic one-period 
default model; the probabilities of the upper and lower branches are 0.99 and 0.01, respec- 
tively. 


In a one-period model, an equivalent martingale measure or risk-neutral measure 
is simply a new probability measure Q such that for every traded security the Q- 
expectation of the discounted pay-off equals the current price of the security, so that 
investing in this security becomes a fair bet. In more general situations (for example, 
in continuous-time models), the idea of a fair bet is formalized by the requirement 
that the discounted price process of a traded security is a so-called Q-martingale 
(hence the name martingale measure). In the basic one-period default model, Q is 
thus given in terms of an artificial default probability g such that 


p10, 1) = (1.05710 — q) - 1 + q - 0.6). 


Clearly, q is uniquely determined by this equation and we get that q = 0.03. Note 
that, in our example, q is bigger than the physical default probability p = 0.01; 
again, this is typical for real markets and reflects the risk premium demanded by 
buyers of defaultable bonds. The example also shows that different approaches are 
needed in order to determine the historical default probability p and the risk-neutral 
default probability q: the former is estimated from historical data such as the default 
history of firms of similar credit quality (see, for example, Sections 10.3.3 and 11.5), 
whereas q is calibrated to market prices of traded securities. 

Under the risk-neutral pricing approach, the price of a security is computed as the 
(conditional) expected value of the discounted future cash flows, where expectation 
is taken with respect to the risk-neutral measure Q. Denoting the pay-off of the 
security at t = 1 by the rv H and the risk-free simple interest rate between time 0 
and time | by rọ,ı > 0, we obtain the following formula for the value ve of the 


claim H att = 0: 
H 
y” = ze( | (2.10) 
? 1 + ro,1 


For a specific example in the basic one-period default model, consider a default put 
option that pays one unit at t = 1 if the bond defaults and zero otherwise; the option 
can be thought of as a simplified version of a credit default swap. Using risk-neutral 
pricing, the value of the option at t = 0 is given by 


Vo = (1.05)! ((1 — q) -0 + q - 1) = (1.05)7 10.03 = 0.0285. 


In continuous-time models one usually uses continuous compounding, and (2.10) 
is therefore replaced by the slightly more general expression 


VŽ = EL (e"T-)H), t<T. (2.11) 
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Here, T is the maturity date of the security and the subscript t on the expecta- 
tion operator indicates that the expectation is taken with respect to the information 
available to investors at time ft, as will be explained in more detail in Chapter 10. 

Formulas (2.10) and (2.11) are known as risk-neutral pricing rules. Risk-neutral 
pricing applied to non-traded financial products is a typical example of Level 2 
valuation: prices of traded securities are used to calibrate model parameters under the 
risk-neutral measure Q; this measure is then used to price the non-traded products. 
We give one example that underscores our use of the Black-Scholes pricing rule in 
Example 2.2. 


Example 2.5 (European call option in Black-Scholes model). Consider again the 
European call option in Example 2.2 and suppose that options with our desired strike 
K and/or maturity time T are not traded, but that other options on the same stock 
are traded. We assume that under the real-world probability measure P the stock 
price (S,) follows a geometric Brownian motion model (the so-called Black-Scholes 
model) with dynamics given by 


dS; = US: dt + oS; dW, 


for constants u € R (the drift) and o > 0 (the volatility), and a standard Brownian 
motion (W;). It is well known that there is an equivalent martingale measure Q 
under which the discounted stock price (e~” S+) is a martingale; under Q, the stock 
price follows a geometric Brownian motion model with drift r and volatility øo. The 
European call option pay-off is H = (Sr — K)* and the risk-neutral valuation 
formula in (2.11) may be shown to take the form 


V, = EP eT- (Sr — K)t) = CBS, Su r,o, K,T), t<T, (2.12) 


with CBS as in Example 2.2. To assign a risk-neutral value to the call option at 
time ¢ (knowing the current price of the stock $+, the interest rate r and the option 
characteristics K and T), we need to calibrate the model parameter ø. As discussed 
above, we would typically use quoted prices CPS(t, S,;r,0, K*, T*) for options 
on the stock with different characteristics to infer a value for o and then plug the 
so-called implied volatility into (2.12). 


There are two theoretical justifications for risk-neutral pricing. First, a standard 
result of mathematical finance (the so-called first fundamental theorem of asset 
pricing) states that a model for security prices is arbitrage free if and only if it 
admits at least one equivalent martingale measure Q. Hence, if a financial product 
is to be priced in accordance with no-arbitrage principles, its price must be given 
by the risk-neutral pricing formula for some risk-neutral measure Q. A second 
justification refers to hedging: in financial models it is often possible to replicate the 
pay-off of a financial product by trading in the assets, a practice known as (dynamic) 
hedging, and it is well known that in a frictionless market the cost of carrying out 
such a hedge is given by the risk-neutral pricing rule. Advantages and limitations of 
risk-neutral pricing will be discussed in more detail in Section 10.4.2. 
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2.2.3 Loss Distributions 


Having mapped the risks of a portfolio, we now consider how to derive loss distri- 
butions with a view to using them in risk-management applications such as capital 
setting. Assuming the current time is f and recalling formula (2.3) for the loss over 
the time period [t, t + 1], 


Lii = AVi = —(f (t+ l, zt + Xu) — ft, 21)), 


we see that in order to determine the loss distribution (i.e. the distribution of L;+1) 
we need to do two things: (i) specify a model for the risk-factor changes X;+1; and 
(ii) determine the distribution of the rv f(t + 1, z: + X;41). 

Note that effectively two kinds of model enter into this process. The models used 
in (i) are projection models used to forecast the behaviour of risk factors in the real 
world, and they are generally estimated from empirical data describing past risk- 
factor changes (Xs)s<+. Depending on the complexity of the positions involved, the 
mapping function f in (ii) will typically also embody valuation models; consider 
in this context the use of the Black-Scholes model to value a European call option, 
as described in Examples 2.2 and 2.5. 

Broadly speaking, there are three kinds of method that can be used to address 
these challenges: an analytical method, a method based on the idea of historical 
simulation, or a simulation approach (also known as a Monte Carlo method). 


Analytical method. In an analytical method we attempt to choose a model for 
Xı+ı and a mapping function f in such a way that the distribution of L;+ı can be 
determined analytically. A prime example of this approach is the so-called variance— 
covariance method for market-risk management, which dates back to the early work 
of the RiskMetrics Group (JPMorgan 1996). In the variance—covariance method the 
risk-factor changes X,+; are assumed to follow a multivariate normal distribution, 
denoted by X;4; ~ Na(, X), where m is the mean vector and X the covariance 
(or variance—covariance) matrix of the distribution. This would follow, for example, 
from assuming that the risk factors Z; evolve in continuous time according to a 
multivariate Brownian motion. The properties of the multivariate normal distribution 
are discussed in detail in Section 6.1.3. 

We also assume that the linearized loss in terms of the risk factors is a sufficiently 
accurate approximation of the actual loss and simplify the problem by consider- 
ing the distribution of Lii defined in (2.4). The linearized loss will have general 
structure 


LA = —(cr +b, X41) (2.13) 


for some constant c; and constant vector b;, which are known to us at time t. For a 
concrete example, consider the stock portfolio of Example 2.1, where the loss takes 
the form D =U w,X;+1 and w; is the vector of portfolio weights at time t. 
An important property of the multivariate normal distribution is that a linear 
function (2.13) of X;+1 must have a univariate normal distribution. From general 


rules for calculating the mean and variance of linear combinations of a random 


2.2. Modelling Value and Value Change 59 


vector we obtain that 


LA, ~ N(=c — biu, b, ,). (2.14) 


The variance—covariance method offers a simple solution to the risk-measurement 
problem, but this convenience is achieved at the cost of two crude simplifying 
assumptions. First, linearization may not always offer a good approximation of the 
relationship between the true loss distribution and the risk-factor changes. Second, 
the assumption of normality is unlikely to be realistic for the distribution of the 
risk-factor changes, certainly for daily data and probably also for weekly and even 
monthly data. A stylized fact of empirical finance suggests that the distribution 
of financial risk-factor returns is leptokurtic and heavier tailed than the Gaussian 
distribution. In Section 3.1.2 we will present evidence for this observation in an 
analysis of daily, weekly, monthly and quarterly stock returns. The implication is 
that an assumption of Gaussian risk factors will tend to underestimate the tail of the 
loss distribution and thus underestimate the risk of the portfolio. 


Remark 2.6. Note that we postpone a detailed discussion of how the model param- 
eters y and X are estimated from historical risk-factor changes (X5)s<; until later 
chapters. We should, however, point out that when a dynamic model for X;+1 is 
considered, different estimation methods are possible depending on whether we 
focus on the conditional distribution of X;+1 given past values of the process or 
whether we consider the equilibrium distribution in a stationary model. These dif- 
ferent approaches are said to constitute conditional and unconditional methods of 
computing the loss distribution—an issue we deal with in much more detail in 
Chapter 9. 


Historical simulation. Instead of estimating the distribution of L;+1 in some 
explicit parametric model for X;4, the historical-simulation method can be thought 
of as estimating the distribution of the loss using the empirical distribution of past 
risk-factor changes. Suppose we collect historical risk-factor change data over n 
time periods and denote these data by X;~n+1, ..., X+. In historical simulation we 
construct the following univariate data set of imaginary losses: 


{Ly =—-(f(t+ 1,2, + Xs) — f(t, z)):s=t—nt+l,...,t}. 


The values L, show what would happen to the current portfolio if the risk-factor 
changes in period s were to recur. If we assume that the process of risk-factor changes 
is stationary with df Fy, then (subject to further technical conditions) the empirical 
df of the historically simulated losses is a consistent estimator of the loss distribution. 
Estimators for any statistic of the loss distribution—such as the expected loss, the 
variance of the loss, or the value-at-risk (see Section 2.3.2 for a definition)—can be 
computed from the empirical df of the historically simulated losses. For instance, the 
expected loss can be estimated by E(L;41) © n7! De pan L;, and techniques 
like empirical quantile estimation can be used to derive estimates of value-at-risk. 
Further details can be found in Chapter 9. 
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The historical-simulation method has obvious attractions: it is easy to implement 
and it reduces the loss distribution calculation problem to a one-dimensional prob- 
lem. However, the success of the approach is dependent on our ability to collect suf- 
ficient quantities of relevant synchronized historical data for all risk factors. As such, 
the method is mainly used in market-risk management for banks, where the issue 
of data availability is less of a problem (due to the relatively short risk-management 
time horizon). 


Monte Carlo method. Any approach to risk measurement that involves the simu- 
lation of an explicit parametric model for risk-factor changes is known as a Monte 
Carlo method. The method does not solve the problem of finding a multivariate 
model for X;+1, and any results that are obtained will only be as good as the model 
that is used. For large portfolios the computational cost of the Monte Carlo approach 
can be considerable, as every simulation requires the revaluation of the portfolio. 
This is particularly problematic if the portfolio contains many derivatives that cannot 
be priced in closed form. Such derivative positions might have to be valued using 
Monte Carlo approximation techniques, which are also based on simulations. This 
leads to situations where Monte Carlo procedures are nested and simulations are 
being generated within simulations, which can be very slow. 

Simulation techniques are frequently used in the management of credit portfolios 
(see, for example, Section 11.4). So-called economic scenario generation models, 
which are used in insurance, also fall under the heading of Monte Carlo methods. 
These are economically motivated and (typically) dynamic models for the evolution 
and interaction of different risk factors, and they can be used to generate realizations 
of X t+1- 


Notes and Comments 


The concept of mapping portfolio values to fandamental risk factors was pioneered 
by the RiskMetrics Group: see the RiskMetrics Technical Document (JPMorgan 
1996) and Mina and Xiao (2001). We explore the topic in more detail, with further 
examples, in Chapter 9. Other textbooks that treat the mapping of positions include 
Dowd (1998), Jorion (2007) and Volume III of Market Risk Analysis by Alexander 
(2009). The use of first-order approximations to the portfolio value (the so-called 
delta approximation) may be found in Duffie and Pan (1997); for second-order 
approximations, see Section 9.1.2. 

More details of the Black-Scholes valuation formula used in Example 2.2 may be 
found in many texts on options and derivatives, such as Haug (1998), Wilmot (2000) 
and Hull (2014). For annuity products similar to the one analysed in Example 2.4 
and other standard life insurance products, good references are Hardy (2003), Møller 
and Steffensen (2007), Koller (2011), Dickson, Hardy and Waters (2013) and the 
classic book by Gerber (1997). 

The best resource for more on International Financial Reporting Standard 7 and 
the fair-value accounting of financial instruments is the International Financial 
Reporting Standards website at www.ifrs.org. Shaffer (2011) considers the impact of 
fair-value accounting on financial institutions, while Laux and Leuz (2010) address 
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the issue of whether fair-value accounting may have contributed to the financial 
crisis. 

Market-consistent actuarial valuation in insurance is the subject of a textbook by 
Wiuthrich, Buhlmann and Furrer (2010) (see also Wuthrich and Merz 2013). The 
fundamental theorem of asset pricing and the conceptual underpinnings of risk- 
neutral pricing are discussed in most textbooks on mathematical finance: see, for 
example, Bjork (2004) or Shreve (2004b). 

The analytical method for deriving loss distributions based on an assumption of 
normal risk-factor changes belongs to the original RiskMetrics methodology cited 
above. For the analysis of the distribution of losses in a bank’s trading book, this 
has largely been supplanted by the use of historical simulation (Pérignon and Smith 
2010). Monte Carlo approaches (economic scenario generators) are widely used in 
internal models for Solvency II in the insurance industry (see Varnell 2011). 


2.3 Risk Measurement 


In very general terms a risk measure associates a financial position with loss L 
with a real number that measures the “riskiness of L”. In practice, risk measures 
are used for a variety of purposes. To begin with, they are used to determine the 
amount of capital a financial institution needs to hold as a buffer against unexpected 
future losses on its portfolio in order to satisfy a regulator who is concerned with 
the solvency of the institution. Similarly, they are used to determine appropriate 
margin requirements for investors trading at an organized exchange. Moreover, risk 
measures are often used by management as a tool for limiting the amount of risk 
a business unit within a firm may take. For instance, traders in a bank might be 
constrained by the rule that the daily 95% value-at-risk of their position should not 
exceed a given bound. 

In Section 2.3.1 we give an overview of some different approaches to measuring 
risk before focusing on risk measures that are derived from loss distributions. We 
introduce the widely used value-at-risk measure in Section 2.3.2 and explain how 
VaR features in risk capital calculations in Section 2.3.3. In Section 2.3.4 alternative 
risk measures derived from loss distributions are presented, and in Section 2.3.5 an 
introduction to the subject of desirable risk measure properties is given, in which 
the notions of coherent and convex risk measures are defined and examples are 
discussed. 


2.3.1 Approaches to Risk Measurement 


Existing approaches to measuring the risk of a financial position can be grouped 
into three categories: the notional-amount approach, risk measures based on loss 
distributions, and risk measures based on scenarios. 


Notional-amount approach. This is the oldest approach to quantifying the risk of 
a portfolio of risky assets. In the notional-amount approach the risk of a portfolio is 
defined as the sum of the notional values of the individual securities in the portfolio, 
where each notional value may be weighted by a factor representing an assessment 
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of the riskiness of the broad asset class to which the security or instrument belongs. 
An example of this approach is the so-called standardized approach in the Basel 
regulatory framework (see Section 1.3.1 for a general description and Section 13.1.2 
for the standardized approach as it applies to operational risk). 

The advantage of the notional-amount approach is its apparent simplicity. How- 
ever, as noted in Section 1.3.1, the approach is flawed from an economic viewpoint 
for a number of reasons. To begin with, the approach does not differentiate between 
long and short positions and there is no netting. For instance, the risk of a long 
position in corporate bonds hedged by an offsetting position in credit default swaps 
would be counted as twice the risk of the unhedged bond position. Moreover, the 
approach does not reflect the benefits of diversification on the overall risk of the 
portfolio. For example, if we use the notional-amount approach, a well-diversified 
credit portfolio consisting of loans to many companies appears to have the same 
risk as a portfolio in which the whole amount is lent to a single company. Finally, 
the notional-amount approach has problems in dealing with portfolios of deriva- 
tives, where the notional amount of the underlying and the economic value of the 
derivative position can differ widely. 


Risk measures based on loss distributions. Most modern measures of the risk in 
a portfolio are statistical quantities describing the conditional or unconditional loss 
distribution of the portfolio over some predetermined horizon At. Examples include 
the variance, the VaR and the ES risk measures, which we discuss in more detail 
later in this chapter. Risk measures based on loss distributions have a number of 
advantages. The concept of a loss distribution makes sense on all levels of aggre- 
gation, from a portfolio consisting of a single instrument to the overall position of 
a financial institution. Moreover, if estimated properly, the loss distribution reflects 
netting and diversification effects. 

Two issues should be borne in mind when working with loss distributions. First, 
any estimate of the loss distribution is based on past data. If the laws governing 
financial markets change, these past data are of limited use in predicting future risk. 
Second, even in a stationary environment it is difficult to estimate the loss distri- 
bution accurately, particularly for large portfolios. Many seemingly sophisticated 
risk-management systems are based on relatively crude statistical models for the loss 
distribution (incorporating, for example, untenable assumptions of normality). These 
issues call for continual improvements in the way that loss distributions are estimated 
and, of course, for prudence in the practical application of risk-management models 
based on estimated loss distributions. In particular, risk measures based on the loss 
distribution should be complemented by information from hypothetical scenarios. 
Moreover, forward-looking information reflecting the expectations of market par- 
ticipants, such as implied volatilities, should be used in conjunction with statistical 
estimates (which are necessarily based on past information) in calibrating models 
of the loss distribution. 


Scenario-based risk measures. In the scenario-based approach to measuring the 
risk of a portfolio, one considers a number of possible future risk-factor changes 
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(scenarios), such as a 10% rise in key exchange rates, a simultaneous 20% drop 
in major stock market indices or a simultaneous rise in key interest rates around 
the globe. The risk of the portfolio is then measured as the maximum loss of the 
portfolio under all scenarios. The scenarios can also be weighted for plausibility. 
This approach to risk measurement is the one that is typically adopted in stress 
testing. 

We now give a formal description. Fix a set X = {x1,..., Xn} of risk-factor 
changes (the scenarios) and a vector w = (w1,..., Wn)’ € [0,1]” of weights. 
Denote by L(x) the loss the portfolio would suffer if the hypothetical scenario x 
were to occur. Using the notation of Section 2.2.1 we get 


L(x) = —-(ft+latx)—fGu)), «eR. 
The risk of the portfolio is then measured by 
Wix, w := max{w,L(x1),..., WnL(Xn)}. (2.15) 


Many risk measures that are used in practice are of the form (2.15). The fol- 
lowing is a simplified description of a system for determining margin requirements 
developed by the Chicago Mercantile Exchange (see Chicago Mercantile Exchange 
2010). To compute the initial margin for a simple portfolio consisting of a position 
in a futures contract and call and put options on this contract, sixteen different sce- 
narios are considered. The first fourteen consist of an up move or a down move of 
volatility combined with no move, an up move or a down move of the futures price 
by h, 5 or 3 of a unit of a specified range. The weights w;,i = 1,..., 14, of these 
scenarios are equal to 1. In addition, there are two extreme scenarios with weights 
w15 = w16 = 0.35. The amount of capital required by the exchange as margin for 
the portfolio is then computed according to (2.15). 


Remark 2.7. We can give a slightly different mathematical interpretation to for- 
mula (2.15), which will be useful in Section 2.3.5. Assume for the moment that 
L(0) = 0, i.e. that the value of the position is unchanged if all risk factors stay 
the same. This is reasonable, at least for a short risk-management horizon At. In 
that case, the expression w; L(x;) can be viewed as the expected value of L under 
a probability measure on the space of risk-factor changes; this measure associates 
amass of w; € [0, 1] to the point x; and a mass of 1 — w; to the point 0. Denote 
by ôx the probability measure associating a mass of one to the point x € R? and by 
Pt x,w] the following set of probability measures on R?: 


P.x,w] = {widx, + (1 — w1)do,.--, Wndx, + (1 — wn)do}. 
Then x, w] can be written as 
WX, w] = max{E” (L(X)): Pe Pix, wl} (2.16) 


A risk measure of the form (2.16), where Px, w) is replaced by some arbitrary subset 
P of the set of all probability measures on the space of risk-factor changes, is termed 
a generalized scenario. Generalized scenarios play an important role in the theory 
of coherent risk measures (see Section 8.1). 
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Scenario-based risk measures are a very useful risk-management tool for port- 
folios exposed to a relatively small set of risk factors, as in the Chicago Mercantile 
Exchange example. Moreover, they provide useful complementary information to 
measures based on statistics of the loss distribution. The main problem in setting 
up a scenario-based risk measure is, of course, determining an appropriate set of 
scenarios and weighting factors. 


2.3.2 Value-at-Risk 


VaR is probably the most widely used risk measure in financial institutions. It has 
a prominent role in the Basel regulatory framework and has also been influential in 
Solvency II. 

Consider a portfolio of risky assets and a fixed time horizon Aż, and denote by 
Fr (D) = P(L < L) the df of the corresponding loss distribution. We want to define 
a Statistic based on Fz that measures the severity of the risk of holding our portfolio 
over the time period Ar. An obvious candidate is the maximum possible loss, given 
by inf {7 € R: F,() = 1}. However, for most distributions of interest, the maximum 
loss is infinity. Moreover, by using the maximum loss, any probability information 
in Fz is neglected. The idea in the definition of VaR is to replace “maximum loss” 
by “maximum loss that is not exceeded with a given high probability”. 


Definition 2.8 (value-at-risk). Given some confidence level w € (0, 1), the VaR of 
a portfolio with loss L at the confidence level «œ is given by the smallest number l 
such that the probability that the loss L exceeds / is no larger than 1 — a. Formally, 


VaRy = VaRg(L) = inf{l € R: P(L > D <1—a)} =inf{l € R: FLD) > a}. 
(2.17) 


In probabilistic terms, VaR is therefore simply a quantile of the loss distribution. 
Typical values for œ are a = 0.95 or a = 0.99; in market-risk management, the 
time horizon At is usually one or ten days, while in credit risk management and 
operational risk management, At is usually one year. Note that by its very definition 
the VaR at confidence level a does not give any information about the severity of 
losses that occur with a probability of less than 1 — a. This is clearly a drawback of 
VaR as arisk measure. For a small case study that illustrates this problem numerically 
we refer to Example 2.16 below. 

Figure 2.2 illustrates the notion of VaR. The probability density function of a loss 
distribution is shown with a vertical line at the value of the 95% VaR. Note that the 
mean loss is negative (E(L) = —2.6), indicating that we expect to make a profit, 
but the right tail of the loss distribution is quite long in comparison with the left tail. 
The 95% VaR value is approximately 2.2, indicating that there is a 5% chance that 
we lose at least this amount. 

Since quantiles play an important role in risk management, we recall the precise 
definition. 
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Figure 2.2. An example of a loss distribution with the 95% VaR marked as a vertical line; 
the mean loss is shown with a dotted line and an alternative risk measure known as the 95% 
ES (see Section 2.3.4 and Definition 2.12) is marked with a dashed line. 


Definition 2.9 (the generalized inverse and the quantile function). 


(i) Given some increasing function T: R — R, the generalized inverse of T is 
defined by T€ (y) := inf{x € R: T(x) > y}, where we use the convention 
that the infimum of an empty set is oo. 


(ii) Given some df F, the generalized inverse F| is called the quantile function 
of F. For a € (0, 1) the w-quantile of F is given by 


da(F) := F* (a) = inf{x € R: F(x) >a}. 


For an rv X with df F we often use the alternative notation qa (X) := qa (F). If 
F is continuous and strictly increasing, we simply have gy(F) = F~!(a), where 
F—' is the ordinary inverse of F. To compute quantiles in more general cases we 
may use the following simple criterion. 


Lemma 2.10. A point xo € R is the w-quantile of some df F if and only if the 
following two conditions are satisfied: F (xo) > a; and F(x) <a forallx < xo. 


The lemma follows immediately from the definition of the generalized inverse 
and the right-continuity of F. Examples of the computation of quantiles in certain 
tricky cases and further properties of generalized inverses are given in Section A.1.2. 


Example 2.11 (VaR for normal and ¢ loss distributions). Suppose that the loss 
distribution Fz is normal with mean u and variance o°. Fix a € (0, 1). Then 


VaRy = u +07! (a), (2.18) 


where ® denotes the standard normal df and #7! (œ) is the a-quantile of ®. The 
proof is easy: since Fz is strictly increasing, by Lemma 2.10 we only have to show 
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that Fz (VaRy) = a. Now, 


L-p 


P(L < VaRy) = P( < o-ta) = ($7! (a)) =a. 
This result is routinely used in the variance—covariance approach (also known as 
the delta-normal approach) to computing risk measures. 

Of course, a similar result is obtained for any location-scale family, and another 
useful example is the Student f loss distribution. Suppose that our loss L is such 
that (L — u)/o has a standard t distribution with v degrees of freedom; we denote 
this loss distribution by L ~ t(v, 4,07) and note that the moments are given by 
E(L) = wand var(L) = vo? /(v —2) when v > 2,soo is not the standard deviation 
of the distribution. We get 


VaRo = u + ot, '(@), (2.19) 


where ¢, denotes the df of a standard ¢ distribution, which is available in most 
statistical computer packages along with its inverse. 


In the remainder of this section we discuss a number of further issues relating to 
the use of VaR as a risk measure in practice. 


Choice of VaR parameters. In working with VaR the parameters At and a need to 
be chosen. There is of course no single optimal value for these parameters, but there 
are some considerations that might influence the choice of regulators or internal- 
model builders. 

The risk-management horizon At should reflect the time period over which a 
financial institution is committed to hold its portfolio, which will be affected by 
contractual and legal constraints as well as liquidity considerations. In choosing a 
horizon for enterprise-wide risk management, a financial institution has little choice 
but to use the horizon appropriate for the market in which its core business activities 
lie. For example, insurance companies are typically bound to hold their portfolio 
of liabilities for one year, during which time they are not able to alter the portfolio 
or renegotiate the premiums they receive; one year is therefore an appropriate time 
horizon for measuring the risk in the liability and asset portfolios of an insurer. 
Moreover, a financial institution can be forced to hold a loss-making position in a 
risky asset if the market for that asset is not very liquid, so a relatively long horizon 
may be appropriate for illiquid assets. 

There are other, more practical, considerations that suggest that At should be 
relatively small. The assumption that the composition of the portfolio remains 
unchanged is tenable only for a short holding period. Moreover, the calibration 
and testing of statistical models for historical risk-factor changes (X;) are easier if 
At is small, since this typically means that we have more data at our disposal. 

For the confidence level a, different values are also appropriate for different 
purposes. In order to set limits for traders, a bank would typically take œ to be 95% 
and At to be one day. For capital adequacy purposes higher confidence levels are 
generally used. For instance, the Basel capital charges for market risk in the trading 
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book of a bank are based on the use of VaR at the 99% level and a ten-day horizon. 
The Solvency II framework uses a value of œ equal to 0.995 and a one-year horizon. 
On the other hand, the backtesting of models that produce VaR figures often needs 
to be carried out at lower confidence levels using shorter horizons in order to have 
sufficient statistical power to detect poor model performance. 


Model risk and market liquidity. In practice, VaR numbers are sometimes given 
a very literal interpretation; the statement that the daily VaR at confidence level 
a = 99% for a particular portfolio is equal to / is understood to mean that “there 
is a probability of exactly 1% that the loss on this position will be larger than l”. 
This interpretation is misleading because it neglects estimation error, model risk and 
market liquidity risk. 

We recall that model risk is the risk that our model for the loss distribution is 
misspecified. For instance, we might work with a normal distribution to model 
losses, whereas the true distribution is heavy tailed, or we might fail to recognize 
the presence of volatility clustering or tail dependence (see Chapter 3) in modelling 
the distribution of the risk-factor changes underlying the losses. Of course, these 
problems are most pronounced if we are trying to estimate VaR at very high con- 
fidence levels. Liquidity risk refers to the fact that any attempt to liquidate a large 
loss-making position is likely to move the price against us, thus exacerbating the 
loss. 


2.3.3 VaR in Risk Capital Calculations 


Quantile-based risk measures are used in many risk capital calculations in practice. 
In this section we give two examples. 


VaR in regulatory capital calculations for the trading book. The VaR risk measure 
is applied to calculate a number of regulatory capital charges for the trading book of 
a bank. Under the internal-model approach a bank calculates a daily VaR measure 
for the distribution of possible ten-day trading book losses based on recent data on 
risk-factor changes under the assumption that the trading book portfolio is held fixed 
over this time period. We describe the statistical methodology that is typically used 
for this calculation in Section 9.2. 

While exact details may vary from one national regulator to another, the basic 
capital charge on day ¢ is usually calculated pee to a formula of the form 


t,10 t—i+1,10 
RC! = max | VaR: 5 D VaR g 99 |, (2.20) 


where VaR 0.99 Stands for the ten-day VaR at the 99% confidence level, calculated on 
day j, and where k is a multiplier in the range 3—4 that is determined by the regulator 
as a function of the overall quality of the bank’s internal model. The averaging of 
the last sixty daily VaR numbers obviously tends to lead to smooth changes in the 
capital charge over time unless the most recent number VaR} 69 is particularly large. 

A number of additional capital charges are added to RC’. These include a stressed 
VaR charge and an incremental risk charge, as well as a number of charges that are 
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designed to take into account so-called specific risks due to idiosyncratic price 
movements in certain instruments that are not explained by general market-risk 
factors. The stressed VaR charge is calculated using similar VaR methodology to 
the standard charge but with data taken from a historical window in which markets 
were particularly volatile. The incremental risk charge is an estimate of the 99.9% 
quantile of the distribution of annual losses due to defaults and downgrades for 
credit-risky instruments in the trading book (excluding securitizations). 


The solvency capital requirement in Solvency II. An informal definition of the 
solvency capital requirement is “the level of capital that enables the insurer to meet 
its obligations over a one-year time horizon with a high confidence level (99.5%) 
(this is taken from a 2007 factsheet produced by De Nederlandsche Bank). We will 
give an argument that leads to the use of a VaR-based risk measure. 

Consider the balance sheet of the insurer in Table 2.2 and assume that the current 
equity capital is given by V; = A; — B;, i.e. the difference between the value of 
assets and the value of liabilities, or the net asset value; this is also referred to under 
Solvency II as own funds. The liabilities B; are considered to include all technical 
provisions computed in a market-consistent way, including risk margins for non- 
hedgeable risks where necessary. 

The insurer wants to ensure that it is solvent in one year’s time with high probabil- 
ity a. It considers the possibility that it may need to raise extra capital and makes the 
following thought calculation. Given its current balance sheet and business model 
it attempts to determine the minimum amount of extra capital xo that it would have 
to raise now at time ¢ and place in a risk-free investment in order to be solvent in 
one year’s time with probability a. In mathematical notation it needs to determine 


xo = inf{x: P(Viz1 +x 4+77,1) = 0) = a}, 


where r; ı is the simple risk-free rate for a one-year investment and V;+ is the net 
asset value in one year’s time. If xo is negative, then the company is well capitalized 
at time t and money could be taken out of the company. 

An easy calculation gives 


xo = inf(x: P(—Vist <x(1+17,1)) = a} 
= inf{x: P(V, — Vi1/(1 + r11) < x + V:i) =a}, 


which shows that 
Vi + x0 = qa (Vi — Vi+1/( + rt,1)). 


The sum V; +xọ gives the solvency capital requirement: namely, the available capital 
corrected by the amount xo. Hence, we see that the solvency capital requirement 
is a quantile of the distribution of V; — V;+1/(1 + 1,1), a loss distribution that 
takes into account the time value of money through discounting, as discussed in 
Section 2.2.1. For a well-capitalized company with x9 < 0, the amount —xọ = 
Vi — qa (Vi — Vi+1/(1 +71,1)) (own funds minus the solvency capital requirement) 
is called the excess capital. 
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2.3.4 Other Risk Measures Based on Loss Distributions 


In this section we provide short notes on a number of other statistical summaries 
of the loss distribution that are frequently used as risk measures in financial and 
insurance risk management. 


Variance. Historically, the variance of the P&L distribution (or, equivalently, the 
standard deviation) has been the dominating risk measure in finance. To a large 
extent this is due to the huge impact that the portfolio theory of Markowitz, which 
uses variance as a measure of risk, has had on theory and practice in finance (see, 
for example, Markowitz 1952). Variance is a well-understood concept that is easy to 
use analytically. However, as a risk measure it has two drawbacks. On the technical 
side, if we want to work with variance, we have to assume that the second moment 
of the loss distribution exists. While unproblematic for most return distributions in 
finance, this can cause problems in certain areas of non-life insurance or for the 
analysis of operational losses (see Section 13.1.4). On the conceptual side, since 
it makes no distinction between positive and negative deviations from the mean, 
variance is a good measure of risk only for distributions that are (approximately) 
symmetric around the mean, such as the normal distribution or a (finite-variance) 
Student f distribution. However, in many areas of risk management, such as in 
credit and operational risk management, we deal with loss distributions that are 
highly skewed. 


Lower and upper partial moments. Partial moments are measures of risk based on 
the lower or upper part of a distribution. In most of the literature on risk management 
the main concern is with the risk inherent in the lower tail of a P&L distribution, 
and lower partial moments are used to measure this risk. Under our sign convention 
we are concerned with the risk inherent in the upper tail of a loss distribution, so we 
focus on upper partial moments. Given an exponent k > 0 and a reference point q, 
the upper partial moment UPM(k, q) is defined as 


OO 
UPM(k, q4) = f (l — qy dF, (J) e [0, ow]. (2.21) 
q 

Some combinations of k and q have a special interpretation: for k = 0 we obtain 
P(L > q); for k = 1 we obtain E((L — q)liL>q}); for k = 2 and q = E(L) we 
obtain the upper semivariance of L. Of course, the higher the value we choose for 
k, the more conservative our risk measure becomes, since we give more and more 
weight to large deviations from the reference point q. 


Expected shortfall. ES is closely related to VaR and there is an ongoing debate 
in the risk-management community on the strengths and weaknesses of both risk 
measures. 


Definition 2.12 (expected shortfall). For a loss L with E(|L|) < œ and df Fz, 
the ES at confidence level a € (0, 1) is defined as 


1 


1 
ESa = —— | q(FL) du, (2.22) 
l-a Jy 


where qu (FL) = F Pa (u) is the quantile function of Fz. 
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The condition E(|L|) < oo ensures that the integral in (2.22) is well defined. By 
definition, ES is related to VaR by 


1 1 
ESy = =S VaR, (L) du. 
l-a Jy 


Instead of fixing a particular confidence level a, we average VaR over all levels 
u > a and thus “look further into the tail” of the loss distribution. Obviously, ESq 
depends only on the distribution of L, and ES, > VaR,. See Figure 2.2 for a simple 
illustration of an ES value and its relationship to VaR. The 95% ES value of 4.9 is 
at least double the 95% VaR value of 2.2 in this case. 

For continuous loss distributions an even more intuitive expression can be derived 
that shows that ES can be interpreted as the expected loss that is incurred in the event 
that VaR is exceeded. 


Lemma 2.13. For an integrable loss L with continuous df Fr and for anya € (0, 1) 
we have 
E(L;L> L 
ES, = A > tl) = E(L| L > VaR), (2.23) 
— q 
where we have used the notation E(X; A) := E(XI,) for a generic integrable rv 
X and a generic set A € F. 


Proof. Denote by U an rv with uniform distribution on the interval [0, 1]. It is 
a well-known fact from elementary probability theory that the rv Fý (U) has df 
Fçņr (see Proposition 7.2 for a proof). We have to show that E(L; L > qa(L)) = 
JÌ F{ (u) du. Now, 


E(L; L > qa(L)) = E(Fi (U); F (U) > FÅ (@)) = E(FĂ (U); U > a); 


in the last equality we used the fact that F$ is strictly increasing since Fz is contin- 
uous (see Proposition A.3 (iii)). Thus we get E(F; (U); U > a) = eC Fý (u) du. 
The second representation follows since, for a continuous loss distribution Fg, we 
have P(L > qa(L)) = 1-a. 


For an extension of this result to loss distributions with atoms, we refer to Proposi- 
tion 8.13. Next we use Lemma 2.13 to calculate the ES for two common continuous 
distributions. 


Example 2.14 (expected shortfall for Gaussian loss distribution). Suppose that 
the loss distribution Fz is normal with mean u and variance o°. Fix a € (0, 1). 
Then ; 

p- 

ESyg = u +0 OS): (2.24) 

l-a 
where ¢ is the density of the standard normal distribution. The proof is elementary. 
First note that 

L- L- L- 
BS, =n +08 ( E E> aol £)): 
o 


oO oO 
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Table 2.3. VaRg and ES, in the normal and t models for different values of a. 


a 0.90 0.95 0.975 0.99 0.995 
VaRg (normal model) 162.1 208.1 247.9 294.3 325.8 
VaRg (t model) 137.1 190.7 248.3 335.1 411.8 
ESq (normal model) 222.0 260.9 295.7 337.1 365.8 
ESq (t model) 223.5 286.5 357.2 466.9 565.7 


hence, it suffices to compute the ES for the standard normal rv L :=(L- u) /o. 
Here we get 

¢(P7'(@)) 
DDZ = ———- 


a l-a 


ES,(L) = =f lọ (Ddl = 
l-a -l (æ) 


Example 2.15 (expected shortfall for the Student ¢ loss distribution). Suppose 
the loss L is such that L = (L — u) /o has a standard ¢ distribution with v degrees of 
freedom, as in Example 2.11. Suppose further that v > 1. By the reasoning of Exam- 
ple 2.14, which applies to any location-scale family, we have ESy = u +0 ES, (ŽL). 
The ES of the standard f distribution is easily calculated by direct integration to be 


v(t, '(@)) ( + oer) 
l-a i 


ESq(L) = (2.25) 


v-l1 
where t, denotes the df and g, the density of standard t. 


Since ES, can be thought of as an average over all losses that are greater than 
or equal to VaRg, it is sensitive to the severity of losses exceeding VaR,. This 
advantage of ES is illustrated in the following example. 


Example 2.16 (VaR and ES for stock returns). We consider daily losses on a 
position in a particular stock; the current value of the position equals V; = 10000. 
Recall from Example 2.1 that the loss for this portfolio is given by L a = —-V,Xi41, 
where X;+1 represents daily log-returns of the stock. We assume that X,+; has 
mean 0 and standard deviation ø = 0.2/./250, i.e. we assume that the stock has an 
annualized volatility of 20%. We compare two different models for the distribution: 
namely, (i) a normal distribution, and (ii) a ¢ distribution with v = 4 degrees of 
freedom scaled to have standard deviation øo. The ¢f distribution is a symmetric 
distribution with heavy tails, so that large absolute values are much more probable 
than in the normal model; it is also a distribution that has been shown to fit well in 
many empirical studies (see Example 6.14). In Table 2.3 we present VaR, and ES, 
for both models and various values of a. In case (i) these values have been computed 
using (2.24); the ES for the t model has been computed using (2.25). 

Most risk managers would argue that the ż model is riskier than the normal 
model, since under the ¢ distribution large losses are more likely. However, if we use 
VaR at the 95% or 97.5% confidence level to measure risk, the normal distribution 
appears to be at least as risky as the t model; only above a confidence level of 
99% does the higher risk in the tails of the t model become apparent. On the other 
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hand, if we use ES, the risk in the tails of the t model is reflected in our risk 
measurement for lower values of œ. Of course, simply going to a 99% confidence 
level in quoting VaR numbers does not help to overcome this deficiency of VaR, as 
there are other examples where the higher risk becomes apparent only for confidence 
levels beyond 99%. 


Remark 2.17. It is possible to derive results on the asymptotics of the shortfall- 
to-quantile ratio ESy/VaRq for a — 1. For the normal distribution we have 
limg+1 ESy/VaRq, = 1; for the ¢ distribution with v > 1 degrees of freedom 
we have limy-+1 ESy/VaRg = v/(v — 1) > 1. This shows that for a heavy-tailed 
distribution, the difference between ES and VaR is more pronounced than for the 
normal distribution. We will take up this issue in more detail in Section 5.2.3 (see 
also Section 8.4.4). 


2.3.5 Coherent and Convex Risk Measures 


The premise of this section is the idea of approaching risk measurement by first 
writing down a list of properties (axioms) that a good risk measure should have. 
For applications in risk management, such axioms have been proposed by Artzner 
et al. (1999) (coherent risk measures) and Follmer and Schied (2002) (convex risk 
measures). In this section we discuss these axioms in relation to specific examples of 
risk measures. A longer and more theoretical treatment of coherent and convex risk 
measures will be given in Chapter 8. It should be mentioned that the idea of having 
axiomatic systems for risk measures bears some relationship to similar systems for 
premium principles in the actuarial literature, which have a long and independent 
history (see, for example, Goovaerts et al. (2003), as well as further references in 
the Notes and Comments section below). 


Axioms for risk measures For the purposes of this section risk measures are real- 
valued functions defined on a linear space of random variables M, assumed to 
include constants. There are two possible interpretations of the elements of M. 
First, elements of M could be considered as future net asset values of portfolios 
or positions; in that case, elements of M will be denoted by V and the current net 
asset value will be denoted by Vo. Second, elements of M could represent losses L, 
where, of course, these are related to future values by the formula L = —(V — Vo) 
(ignoring any discounting for simplicity). 

Correspondingly, there are two possible notions of risk measures on M. On the 
one hand, we can view the risk measure as the amount of additional capital that 
needs to be added to a position with future net asset value V to make the position 
acceptable to a regulator or a prudent manager; in this case we write the risk measure 
as 0(V). On the other hand, we might interpret the risk measure as the total amount 
of equity capital that is necessary to back a position with loss L; in this case we 
write the risk measure as ọ(L). 

These two notions are related by 


o(L) = Vo + a(V), 
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since “total capital” is equal to “available capital” plus “additional capital”. However, 
these two different notions do have a bearing on the way in which the axioms are 
presented and understood. Since in this book we mostly focus on loss distributions, 
we present the axioms for losses and consider a risk measure 9: L + @(L). Note 
that the alternative notion is frequently found in the literature. 


Axiom 2.18 (monotonicity). For Lı, L2 E€ M such that Lı < L2 almost surely, 
we have e(L1) < e(Z2). 


From an economic viewpoint this axiom is obvious: positions that lead to higher 
losses in every state of the world require more risk capital. Positions with e(L) < 0 
do not require any capital. 


Axiom 2.19 (translation invariance). For all L € M and every l € R we have 
o(L+/1)=o(L) +1. 


Axiom 2.19 states that by adding or subtracting a deterministic quantity / to a 
position leading to the loss L, we alter our capital requirements by exactly that 
amount. In terms of the alternative notion of a risk measure defined on future net 
asset values, this axiom implies that o(V +k) = o(V) —k fork € R. It follows that 
o(V+a(V)) = 0, so a position with future net asset value V + 0(V) is immediately 
acceptable without further injection of capital. This makes sense and implies that 
risk is measured in monetary terms. 


Axiom 2.20 (subadditivity). For all L4, L2 E€ M we have 0(L; + L2) < o(L1) + 
o(L2). 


The rationale behind Axiom 2.20 is summarized by Artzner et al. (1999) in the 
statement that “a merger does not create extra risk” (ignoring, of course, any prob- 
lematic practical aspects of a merger!). Axiom 2.20 is the most debated of the four 
axioms characterizing coherent risk measures, probably because it rules out VaR as a 
risk measure in certain situations. We provide some arguments explaining why sub- 
additivity is indeed a reasonable requirement. First, subadditivity reflects the idea 
that risk can be reduced by diversification, a time-honoured principle in finance and 
economics. Second, if a regulator uses a non-subadditive risk measure in determin- 
ing the regulatory capital for a financial institution, that institution has an incentive 
to legally break up into various subsidiaries in order to reduce its regulatory capital 
requirements. Similarly, if the risk measure used by an organized exchange in deter- 
mining the margin requirements of investors is non-subadditive, an investor could 
reduce the margin he has to pay by opening a different account for every position 
in his portfolio. Finally, subadditivity makes decentralization of risk-management 
systems possible. Consider as an example two trading desks with positions leading 
to losses Lı and L2. Imagine that a risk manager wants to ensure that o(L), the 
risk of the overall loss L = Lı + L2, is smaller than some number M. If he uses 
a subadditive risk measure ọ, he may simply choose bounds M; and M3 such that 
Mı + M2 < M and impose on each of the desks the constraint that o(L;) < Mj; 
subadditivity of ọ then automatically ensures that o(L) < Mı + M2 < M. 
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Axiom 2.21 (positive homogeneity). For all L € M and every à > 0 we have 
oL) = AQ(L). 


Axiom 2.21 is easily justified if we assume that Axiom 2.20 holds. Subadditivity 
implies that, for n € N, 


o(nL) =o(L +--+ L) <no(L). (2.26) 


Since there is no netting or diversification between the losses in this portfolio, it 
is natural to require that equality should hold in (2.26), which leads to positive 
homogeneity. Note that subadditivity and positive homogeneity imply that the risk 
measure ọ is convex on M. 


Definition 2.22 (coherent risk measure). A risk measure ọ whose domain includes 
the convex cone M is called coherent (on M) if it satisfies Axioms 2.18-2.21. 


Axiom 2.21 (positive homogeneity) has been criticized and, in particular, it has 
been suggested that for large values of the multiplier à we should have ọ(àL) > 
iQ (L) in order to penalize a concentration of risk and to account for liquidity risk in 
a large position. As shown in (2.26), this is impossible for a subadditive risk measure. 
This problem has led to the study of the larger class of convex risk measures. In this 
class the conditions of subadditivity and positive homogeneity have been relaxed; 
instead one requires only the weaker property of convexity. 


Axiom 2.23 (convexity). For all L;, L2 € M and all à € [0, 1] we have ọo(àLı + 
(1 —A)L2) <S A@(L1) + (1 — A)o(L2). 


The economic justification for convexity is again the idea that diversification 
reduces risk. 


Definition 2.24 (convex risk measure). A risk measure ọ on M is called convex 
(on M) if it satisfies Axioms 2.18, 2.19 and 2.23. 


While every coherent risk measure is convex, the converse is not true. In particular, 
within the class of convex risk measures it is possible to find risk measures that 
penalize concentration of risk in the sense that ọo(àÀL) > o(L) for A > 1 (see, 
for example, Example 8.8). On the other hand, for risk measures that are positive, 
homogeneous convexity and subadditivity are equivalent. 


Examples. In view of its practical relevance we begin with a discussion of VaR. It is 
immediate from the definition of VaR as a quantile of the loss distribution that VaR is 
translation invariant, monotone and positive homogeneous. However, the following 
example shows that VaR is in general not subadditive, and hence, in general, neither 
is it a convex nor a coherent measure of risk. 


Example 2.25 (non-subadditivity of VaR for defaultable bonds). Consider a 
portfolio of two zero-coupon bonds with a maturity of one year that default inde- 
pendently. The default probability of both bonds is assumed to be identical and equal 
to p = 0.9%. The current price of the bonds and the face value of the bonds is equal 
to 100, and the bonds pay an interest rate of 5%. If there is no default, the holder 
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of the bond therefore receives a payment of size 105 in one year; in the case of a 
default, he receives nothing, i.e. we assume a recovery rate of zero. Denote by L; 
the loss incurred by holding one unit of bond i. We have 


P(L; = —5) = 1 — p = 0.991 (no default), 
P(Li=100)=p =0.009 (default). 


Set a = 0.99. We have P(L; < —5) = 0 and P(L; < —5) = 0.991 > a, so 
VaRy (Li) = —5. 

Now consider a portfolio of one bond from each firm, with corresponding loss 
L = Lı + L2. Since the default events of the two firms are independent, we get 


P(L = —10) = (1 — py = 0.982 (no default), 
P(L = 95) = 2p(1 — p) = 0.07838 (one default), 
P(L = 200) = p = 0.000081 (two defaults). 


In particular, P(L < —10) = 0.982 < 0.99 and P(L < 95) > 0.99, so VaRa (L) = 
95 > —10 = VaRa(L1) + VaRa(L2). Hence, VaR is non-subadditive. In fact, in 
the example, VaRg punishes diversification, as 


VaRy (0.5L + 0.5L) = 0.5 VaRa (L) = 47.5 > VaRy(L1). 


In Example 2.25 the non-subadditivity of VaR is caused by the fact that the assets 
making up the portfolio have very skewed loss distributions; such a situation can 
clearly occur if we have defaultable bonds or options in our portfolio. Note, however, 
that the assets in this example have an innocuous dependence structure because they 
are independent. 

In fact, the non-subadditivity of VaR can be seen in many different examples. The 
following is a list of the situations that we will encounter in this book. 


e Independent losses with highly skewed discrete distributions, as in Exam- 
ple 2.25. 


Independent losses with continuous light-tailed distributions but low values 
of a. This will be demonstrated for exponentially distributed losses in Exam- 
ple 7.30 and discussed further in Section 8.3.3. 


Dependent losses with continuous symmetric distributions when the depend- 
ence structure takes a special form. This will be demonstrated for normally 
distributed losses in Example 8.39. 


Independent losses with continuous but very heavy-tailed distributions. This 
can be seen in Example 8.40 for the extreme case of infinite-mean Pareto 
risks. While less relevant for modelling market and credit risks, infinite-mean 
distributions are sometimes used to model certain kinds of insurance losses 
as well as losses due to operational risk (see Chapter 13 for more discussion). 


Note that the domain M is an integral part of the definition of a convex or coherent 
risk measure. We will often encounter risk measures that are coherent or convex if 


76 2. Basic Concepts in Risk Management 


restricted to a sufficiently small domain. For example, VaR is subadditive in the 
idealized situation where all portfolios can be represented as linear combinations 
of the same set of underlying multivariate normal or, more generally, elliptically 
distributed risk factors (see Proposition 8.28). 

There is an ongoing debate about the practical relevance of the non-subadditivity 
of VaR. The non-subadditivity can be particularly problematic if VaR is used to set 
risk limits for traders, as this can lead to portfolios with a high degree of concentration 
risk. Consider, for instance, in the set-up of Example 2.25 a trader who wants to 
maximize the expected return of a portfolio in the two defaultable bonds under the 
constraint that the VaR of his position is smaller than some given positive number; 
no short selling is permitted. Clearly, an optimal strategy for this trader is to invest 
all funds in one of the two bonds, a very concentrated position. For an elaboration 
of this toy example we refer to Frey and McNeil (2002). 


Example 2.26 (coherence of expected shortfall). ES, on the other hand, is a coher- 
ent risk measure. Translation invariance, monotonicity and positive homogeneity are 
immediate from the corresponding properties of the quantile. For instance, it holds 
that 


1 l 1 1 
ES, (ÀL) = —/ qu(AL) du = =S àqu (L) du = XES,(L), 
a a 


and similar arguments apply for translation invariance and monotonicity. A general 
proof of subadditivity is given in Theorem 8.14. Here, we give a simple argument 
for the case where L1, L2 and Lı + L2 have a continuous distribution. We recall 
from Lemma 2.13 that for a random variable L with a continuous distribution, it 
holds that 


1 
ES (L) = Tog ECL). 


— q 
To simplify the notation let J) := llizqa(LdDp 22 = MMazqa(L)} and I := 
NLi+L22qa(L1+L2)}: We calculate that 
(1 — a) (ESa (L1) + ESa (L1) — ESa(L1 + L2)) 
= E(Lih) + E(L2hk) — E((Lı + L2)h2) 
= E(Li(h — h2)) + E(L2U2 — h2)). 
Consider the first term and suppose that {L1 > gq(L1)}. It follows that 7; — 2 > 0 
and hence that Lı (3 — 2) > ga(L1)(i — D2). Suppose, on the other hand, 
that {Li < qq(L1)}. It follows that 7} — J12 < 0 and hence that L1(4) — 112) 2 
da(L1)( — M2). The same reasoning applies to L2, so in either case we conclude 
that 
(1 — a) (ŒSa (L1) + ESa (L1) — ESq(Li + L2)) 
2 E(qa(Li)(h — I2)) + E(qa(L2) U2 — Si2)) 
2 qa(LiJE( — 112) + qa (L2)E U2 — I2) 
2 qa(L1)( — a) — (1 — @)) + qa (L2)((1 — a) — (1 — @)) 
= 0, 


which proves subadditivity. 
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We have now seen two advantages of ES over VaR: ES reflects tail risk better (see 
Example 2.16) and it is always subadditive. On the other hand, VaR has practical 
advantages: it is easier to estimate, in particular for heavy-tailed distributions, and 
VaR estimates are easier to backtest than estimates of ES. We will come back to this 
point in our discussion of backtesting in Section 9.3. 


Example 2.27 (generalized scenario risk measure). The generalized scenario 
risk measure in (2.16) is another example of a coherent risk measure. Translation 
invariance, positive homogeneity and monotonicity are clear, so it only remains to 
check subadditivity. For i = 1, 2 denote by L; (x) the hypothetical loss of position 
i under the scenario x for the risk-factor changes. We observe that 


max{E? (L1(X)) + L2(X)): P € Pix.wy} 
< max{E?(L1(X)): P € Prx.w)} + max{E” (L2(X)): P € Pix.wy}- 


Example 2.28 (a coherent premium principle). In Fischer (2003), a class of 
coherent risk measures closely resembling certain actuarial premium principles is 
proposed. These risk measures are potentially useful for an insurance company that 
wants to compute premiums on a coherent basis without deviating too far from 
standard actuarial practice. 

Given constants p > 1 anda € [0, 1), this coherent premium principle Q{a, p] 
is defined as follows. Let M := LP (2, F, P), the space of all L with ||L]lp := 
E(\L|?)!/P < oo, and define, for L € M, 


Ola, p (L) = E(L) + allL — E(L))* lp. (2.27) 


Under (2.27) the risk associated with a loss L is measured by the sum of E(L), 
the pure actuarial premium for the loss, and a risk loading given by a fraction a of 
the L?-norm of the positive part of the centred loss L — E(L). This loading can be 
written more explicitly as ( f EC L) (L — E(L))? dF, (1))!/”. The higher the values of 
a and p, the more conservative the risk measure Ojo, p) becomes. 

The coherence of Qjw,p] is easy to check. Translation invariance and positive 
homogeneity are immediate. To prove subadditivity observe that for any two rvs X 
and Y we have (X + Y)t < X*+Y*. Hence, from Minkowski’s inequality (the 
triangle inequality for the L?-norm) we obtain that for any two L1, L2 E€ M, 


(Lr — E(L1) + L2 — E(L2))"* lip < Li — E(L1))* + (L2 — E(L2))* lip 
<S (Li — E(L1))" Ip + I2 — E(L2))" Ilp, 


which shows that Q[q, p] is subadditive. To verify monotonicity, assume that L; < L2 
almost surely and write L = Lı — L2. Since L < 0 almost surely, it follows 
that (L — E(L))* < —E(L) almost surely, and hence that ||(L — E(L))*||p < 
—E(L) and Q[a,p\(L) < 0, since a < 1. Using the fact that L} = L2 + L and the 
subadditivity property we obtain 


Ola, pi (L1) S Qia, pı (L2) + Ofa, pi (L) < Ote, pi (L2). 
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Notes and Comments 


An extensive discussion of different approaches to risk quantification is given in 
Crouhy, Galai and Mark (2001). Value-at-risk was introduced by JPMorgan in the 
first version of its RiskMetrics system and was quickly accepted by risk managers 
and regulators as an industry standard; see also Brown (2012) for a broader view of 
the history and use of VaR on Wall Street. A number of different notions of VaR are 
used in practice (see Alexander 2009, Volume 4) but all are related to the idea of a 
quantile of the P&L distribution. 

Expected shortfall was made popular by Artzner et al. (1997, 1999). There are a 
number of variants of the ES risk measure with a variety of names, such as tail condi- 
tional expectation, worst conditional expectation and conditional VaR; all coincide 
for continuous loss distributions. Acerbi and Tasche (2002) discuss the relationships 
between the various notions. Risk measures based on loss distributions also appear 
in the literature under the (somewhat unfortunate) heading of law-invariant risk 
measures. 

Example 2.25 is due to Albanese (1997) and Artzner et al. (1999). There are 
many different examples of the non-subadditivity of VaR in the literature, including 
the case of independent, infinite-mean Pareto risks (see Embrechts, McNeil and 
Straumann 2002, Example 7; Denuit and Charpentier 2004, Example 5.2.7). The 
implications of the non-subadditivity of VaR for portfolio optimization are discussed 
in Frey and McNeil (2002); see also papers by Basak and Shapiro (2001), Krokhmal, 
Palmquist and Uryasev (2001) and Emmer, Kluppelberg and Korn (2001). 

A class of risk measures that are widely used throughout the hedge fund industry is 
based on the peak-to-bottom loss over a given period of time in the performance curve 
of an investment. These measures are typically referred to as (maximal) drawdown 
risk measures (see, for example, Chekhlov, Uryasev and Zabarankin 2005; Jaeger 
2005). 

The measurement of financial risk and the computation of actuarial premiums are 
at least conceptually closely related problems, so that the actuarial literature on pre- 
mium principles is of relevance in financial risk management. We refer to Chapter 3 
of Rolski et al. (1999) for an overview; Goovaerts, De Vylder and Haezendonck 
(1984) provides a specialist account. 

Model risk has become a central issue in modern risk management. The problems 
faced by the hedge fund LTCM in 1998 provide a prime example of model risk in 
VaR-based risk-management systems. While LTCM had a seemingly sophisticated 
VaR system in place, errors in parameter estimation, unexpectedly large market 
moves (heavy tails) and, in particular, vanishing market liquidity drove the hedge 
fund into near-bankruptcy, causing major financial turbulence around the globe. 
Jorion (2000) contains an excellent discussion of the LTCM case, in particular 
comparing a Gaussian-based VaR model with a t-based approach. At a more general 
level, Jorion (2002a) discusses the various fallacies surrounding VaR-based market- 
risk-management systems. 
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Empirical Properties of Financial Data 


In Chapter 2 we saw that the risk that a financial portfolio loses value over a given time 
period can be modelled in terms of changes in fundamental underlying risk factors, 
such as equity prices, interest rates and foreign exchange rates (see, in particular, the 
examples of Section 2.2.1). To build realistic models for risk-management purposes 
we need to consider the empirical properties of fundamental risk factors and develop 
models that share these properties. 

In this chapter we first consider the univariate properties of single time series 
of risk-factor changes in Section 3.1, before reviewing some of the properties of 
multivariate series in Section 3.2. The features we describe motivate the statistical 
methodology of Part II of this book. 


3.1 Stylized Facts of Financial Return Series 


The stylized facts of financial time series are a collection of empirical observations, 
and inferences drawn from these observations, that apply to many time series of 
risk-factor changes, such as log-returns on equities, indices, exchange rates and 
commodity prices; these observations are now so entrenched in econometric expe- 
rience that they have been accorded the status of facts. 

The stylized facts that we describe typically apply to time series of daily log- 
returns and often continue to hold when we consider longer-interval series, such 
as weekly or monthly returns, or shorter-interval series, such as intra-daily returns. 
Most risk-management models are based on data collected at these frequencies. 

Very-high-frequency financial time series, such as tick-by-tick data, have their 
own stylized facts, but this will not be a subject of this chapter. Moreover, the 
properties of very-low-frequency data (such as annual returns) are more difficult to 
pin down, due to the sparseness of such data and the difficulty of assuming that they 
are generated under long-term stationary regimes. 

For a single time series of financial returns, a version of the stylized facts is as 
follows. 


(1) Return series are not independent and identically distributed (iid), although 
they show little serial correlation. 


(2) Series of absolute or squared returns show profound serial correlation. 


(3) Conditional expected returns are close to zero. 
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(4) Volatility appears to vary over time. 
(5) Extreme returns appear in clusters. 


(6) Return series are leptokurtic or heavy tailed. 


To discuss these observations further we denote a return series by X1, ..., Xn and 
assume that the returns have been formed by logarithmic differencing of a price, 
index or exchange-rate series (S;)+=0,1,....n, SO X = In(S;/S;-1),t = 1,...,n. 


3.1.1 Volatility Clustering 


Evidence for the first two stylized facts is collected in Figures 3.1 and 3.2. Fig- 
ure 3.1 (a) shows 2608 daily log-returns for the DAX index spanning a decade from 
2 January 1985 to 30 December 1994, a period including both the stock market 
crash of 1987 and the reunification of Germany. Parts (b) and (c) show series of 
simulated iid data from a normal model and a Student t model, respectively; in both 
cases the model parameters have been set by fitting the model to the real return data 
using the method of maximum likelihood under the iid assumption. In the normal 
case, this means that we simply simulate iid data with distribution N (u, 0°), where 
w= X =n! Xiando? =n! $} (X; — X)*. In thet case, the likelihood 
has been maximized numerically and the estimated degrees of freedom parameter 
isv = 3.8. 

The simulated normal data are clearly very different from the DAX return data and 
do not show the same range of extreme values. While the Student t model can gen- 
erate comparable extreme values to the real data, more careful observation reveals 
that the real returns exhibit a phenomenon known as volatility clustering, which is 
not present in the simulated series. Volatility clustering is the tendency for extreme 
returns to be followed by other extreme returns, although not necessarily with the 
same sign. We can see periods such as the stock market crash of October 1987 or 
the political uncertainty in the period between late 1989 and German reunification 
in 1990 in the DAX data: they are marked by large positive and negative moves. 

In Figure 3.2 the correlograms of the raw data and the absolute data for all three 
data sets are shown. The correlogram is a graphical display for estimates of serial 
correlation, and its construction and interpretation are discussed in Section 4.1.3. 
While there is very little evidence of serial correlation in the raw data for all data sets, 
the absolute values of the real financial data appear to show evidence of serial 
dependence. Clearly, more than 5% of the estimated correlations lie outside the 
dashed lines, which are the 95% confidence intervals for serial correlations when 
the underlying process consists of iid finite-variance rvs. This serial dependence 
in the absolute returns would be equally apparent in squared return values, and it 
seems to confirm the presence of volatility clustering. We conclude that, although 
there is no evidence against the iid hypothesis for the genuinely iid data, there is 
strong evidence against the iid hypothesis for the DAX return data. 

Table 3.1 contains more evidence against the iid hypothesis for daily stock-return 
data. The Ljung—Box test of randomness (described in Section 4.1.3) has been per- 
formed for the stocks comprising the Dow Jones 30 index in the period 1993-2000. 
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Figure 3.1. (a) Log-returns for the DAX index from 2 January 1985 to 30 December 1994 
compared with simulated iid data from (b) a normal and (c) a ¢ distribution, where the 
parameters have been determined by fitting the models to the DAX data. 


In the two columns for daily returns the test is applied, respectively, to the raw return 
data (LBraw) and their absolute values (LBabs), and p-values are tabulated; these 
show strong evidence (particularly when applied to absolute values) against the 
iid hypothesis. If financial log-returns are not iid, then this contradicts the popular 
random-walk hypothesis for the discrete-time development of log-prices (or, in this 
case, index values). If log-returns are neither iid nor normal, then this contradicts 
the geometric Brownian motion hypothesis for the continuous-time development of 
prices on which the Black—Scholes—Merton pricing theory is based. 

Moreover, if there is serial dependence in financial return data, then the question 
arises: to what extent can this dependence be used to make predictions about the 
future? This is the subject of the third and fourth stylized facts. It is very difficult 
to predict the return in the next time period based on historical data alone. This 
difficulty in predicting future returns is part of the evidence for the well-known 
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Figure 3.2. Correlograms for (a) the three data sets in Figure 3.1 and (b) the absolute values 
of these data. Dotted lines mark the standard 95% confidence intervals for the autocorrelations 
of a process of iid finite-variance rvs. 


efficient markets hypothesis in finance, which says that prices react quickly to reflect 
all the available information about the asset in question. 

In empirical terms, the lack of predictability of returns is shown by a lack of 
serial correlation in the raw return series data. For some series we do sometimes see 
evidence of correlations at the first lag (or first few lags). A small positive correlation 
at the first lag would suggest that there is some discernible tendency for a return 
with a particular sign (positive or negative) to be followed in the next period by a 
return with the same sign. However, this is not apparent in the DAX data, which 
suggests that our best estimate for tomorrow’s return based on our observations up 
to today is zero. This idea is expressed in the assertion of the third stylized fact: that 
conditional expected returns are close to zero. 

Volatility is often formally modelled as the conditional standard deviation of 
financial returns given historical information, and, although the conditional expected 
returns are consistently close to zero, the presence of volatility clustering suggests 
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Table 3.1. Tests of randomness for returns of Dow Jones 30 stocks in the eight-year period 
1993-2000. The columns LBraw and LBabs show p-values for Ljung—Box tests applied to 
the raw and absolute values, respectively. 


Daily Monthly 
hh i, poia 
Name Symbol LBraw LBabs LBraw LBabs 
Alcoa AA 0.00 0.00 0.23 0.02 
American Express AXP 0.02 0.00 0.55 0.07 
AT&T T 0.11 0.00 0.70 0.02 
Boeing BA 0.03 0.00 0.90 0.17 
Caterpillar CAT 0.28 0.00 0.73 0.07 
Citigroup C 0.09 0.00 0.91 0.48 
Coca-Cola KO 0.00 0.00 0.50 0.03 
DuPont DD 0.03 0.00 0.75 0.00 
Eastman Kodak EK 0.15 0.00 0.61 0.54 
Exxon Mobil XOM 0.00 0.00 0.32 0.22 
General Electric GE 0.00 0.00 0.25 0.09 
General Motors GM 0.65 0.00 0.81 0.27 
Hewlett-Packard HWP 0.09 0.00 0.21 0.02 
Home Depot HD 0.00 0.00 0.00 0.41 
Honeywell HON 0.44 0.00 0.07 0.30 
Intel INTC 0.23 0.00 0.79 0.62 
IBM IBM 0.18 0.00 0.67 0.28 
International Paper IP 0.15 0.00 0.01 0.09 
JPMorgan JPM 0.52 0.00 0.43 0.12 
Johnson & Johnson JNJ 0.00 0.00 0.11 0.91 
McDonald’s MCD 0.28 0.00 0.72 0.68 
Merck MRK 0.05 0.00 0.53 0.65 
Microsoft MSFT 0.28 0.00 0.19 0.13 
3M MMM 0.00 0.00 0.57 0.33 
Philip Morris MO 0.01 0.00 0.68 0.82 
Procter & Gamble PG 0.02 0.00 0.99 0.74 
SBC SBC 0.05 0.00 0.13 0.00 
United Technologies UTX 0.00 0.00 0.12 0.01 
Wal-Mart WMT 0.00 0.00 0.41 0.64 
Disney DIS 0.44 0.00 0.01 0.51 


that conditional standard deviations are continually changing in a partly predictable 
manner. If we know that returns have been large in the last few days, due to mar- 
ket excitement, then there is reason to believe that the distribution from which 
tomorrow’s return is “drawn” should have a large variance. It is this idea that 
lies behind the time-series models for changing volatility that we will examine 
in Chapter 4. 

Further evidence for volatility clustering is given in Figure 3.3, where the time 
series of the 100 largest daily losses for the DAX returns and the 100 largest val- 
ues for the simulated ¢ data are plotted. In Section 5.3.1 we summarize the theory 
that suggests that the very largest values in iid data will occur like events in a 
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Figure 3.3. Time-series plots of the 100 largest negative values for (a) the DAX returns 
and (c) the simulated ¢ data as well as (b), (d) Q-Q plots of the waiting times between these 
extreme values against an exponential reference distribution. 


Poisson process, separated by waiting times that are iid with an exponential dis- 
tribution. Parts (b) and (d) of the figure show Q-Q plots of these waiting times 
against an exponential reference distribution. While the hypothesis of the Pois- 
son occurrence of extreme values for the iid data is supported, there are too many 
short waiting times and long waiting times caused by the clustering of extreme val- 
ues in the DAX data to support the exponential hypothesis. The fifth stylized fact 
therefore constitutes further strong evidence against the iid hypothesis for return 
data. 

In Chapter 4 we will introduce time-series models that have the volatility clus- 
tering behaviour that we observe in real return data. In particular, we will describe 
ARCH and GARCH models, which can replicate all of the stylized facts we have 
discussed so far, as well as the typical non-normality of returns addressed by the 
sixth stylized fact, to which we now turn. 
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Figure 3.4. A Q-Q plot of daily returns of the Disney share price 
from 1993 to 2000 against a normal reference distribution. 


3.1.2 Non-normality and Heavy Tails 


The normal distribution is frequently observed to be a poor model for daily, weekly 
and even monthly returns. This can be confirmed using various well-known tests of 
normality, including the Q-Q plot against a standard normal reference distribution, 
as well as a number of formal numerical tests. 

A Q-Q plot (quantile—quantile plot) is a standard visual tool for showing the 
relationship between empirical quantiles of the data and theoretical quantiles of a 
reference distribution. A lack of linearity in the Q-Q plot is interpreted as evidence 
against the hypothesized reference distribution. In Figure 3.4 we show a Q-Q plot of 
daily returns of the Disney share price from 1993 to 2000 against a normal reference 
distribution; the inverted S-shaped curve of the points suggests that the more extreme 
empirical quantiles of the data tend to be larger than the corresponding quantiles 
of a normal distribution, indicating that the normal distribution is a poor model for 
these returns. 

Common numerical tests include those of Jarque and Bera, Anderson and Darling, 
Shapiro and Wilk, and D’ Agostino. The Jarque—Bera test belongs to the class of 
omnibus moment tests, i.e. tests that assess simultaneously whether the skewness 
and kurtosis of the data are consistent with a Gaussian model. The sample skewness 
and kurtosis coefficients are defined by 


A/a) Vi OG — XP p Ll Lia % = XY" 
(a/m Diz (Xi — X332 (4/0 Diz (Xi -5 
These are designed to estimate the theoretical skewness and kurtosis, which are 
defined, respectively, by 8 = E(X — u)? /o? and k = E(X — p)*/o4, where 
u = E(X) and o? = var(X) denote mean and variance; 6 and x take the values 
zero and three for a normal variate X. The Jarque—Bera test statistic is 


(3.1) 


T = gn(b* + 4(k — 3)°), (3.2) 
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Table 3.2. Sample skewness (b) and kurtosis (k) coefficients as well as p-values for Jar- 
que—Bera tests of normality for an arbitrary set of ten of the Dow Jones 30 stocks (see Table 3.1 
for names of stocks). 


Daily returns, n = 2020 Weekly returns, n = 416 
ee O——_asaa———ae—e ee 


Stock b k p-value b k p-value 
AXP 0.05 5.09 0.00 —0.01 3.91 0.00 
EK —1.93 31.20 0.00 —1.13 14.40 0.00 
BA —0.34 10.89 0.00 —0.26 7.54 0.00 
C 0.21 5.93 0.00 0.44 5.42 0.00 
KO —0.02 6.36 0.00 —0.21 4.37 0.00 
MSFT —0.22 8.04 0.00 —0.14 5.25 0.00 
HWP —0.23 6.69 0.00 —0.26 4.66 0.00 
INTC —0.56 8.29 0.00 —0.65 5.20 0.00 
JPM 0.14 5.25 0.00 —0.20 4.93 0.00 
DIS —0.01 9.39 0.00 0.08 4.48 0.00 


Monthly returns, n = 96 Quarterly returns, n = 32 
Oe —_a_—eee ee 


Stock b k p-value b k p-value 
AXP —1.22 5.99 0.00 —1.04 4.88 0.01 
EK —1.52 10.37 0.00 —0.63 4.49 0.08 
BA —0.50 4.15 0.01 —0.15 6.23 0.00 
C —1.10 7.38 0.00 —1.61 7.13 0.00 
KO —0.49 3.68 0.06 —1.45 5.21 0.00 
MSFT —0.40 3.90 0.06 —0.56 2.90 0.43 
HWP —0.33 3.47 0.27 —0.38 3.64 0.52 
INTC —1.04 6.50 0.00 —0.42 3.10 0.62 
JPM —0.51 5.40 0.00 —0.78 7.26 0.00 
DIS 0.04 3.26 0.87 —0.49 4.32 0.16 


and it has an asymptotic chi-squared distribution with two degrees of freedom under 
the null hypothesis of normality; sample kurtosis values differing widely from three 
and skewness values differing widely from zero may lead to rejection of normality. 

In Table 3.2 tests of normality are applied to an arbitrary subgroup of ten of 
the stocks comprising the Dow Jones index. We take eight years of data spanning 
the period 1993-2000 and form daily, weekly, monthly and quarterly logarithmic 
returns. For each stock we calculate sample skewness and kurtosis and apply the 
Jarque—Bera test to the univariate time series. The daily and weekly return data fail 
all tests; in particular, it is notable that there are some large values for the sample 
kurtosis. For the monthly data, the null hypothesis of normality is not formally 
rejected (that is, the p-value is greater than 0.05) for four of the stocks; for quarterly 
data, it is not rejected for five of the stocks, although here the sample size is small. 

The Jarque—Bera test (3.2) clearly rejects the normal hypothesis. In particular, 
daily financial return data appear to have a much higher kurtosis than is consistent 
with the normal distribution; their distribution is said to be leptokurtic, meaning 
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that it is more narrow than the normal distribution in the centre but has longer and 
heavier tails. 

Further empirical analysis often suggests that the distribution of daily or other 
short-interval financial return data has tails that decay slowly according to a power 
law, rather than the faster, exponential-type decay of the tails of anormal distribution. 
This means that we tend to see rather more extreme values than might be expected in 
such return data; we discuss this phenomenon further in Chapter 5, which is devoted 
to extreme value theory. 


3.1.3 Longer-Interval Return Series 


As we progressively increase the interval of the returns by moving from daily to 
weekly, monthly, quarterly and yearly data, the phenomena we have identified tend 
to become less pronounced. Volatility clustering decreases and returns begin to look 
both more iid and less heavy tailed. 

Beginning with a sample of n returns measured at some time interval (say daily or 
weekly), we can aggregate these to form longer-interval log-returns. The h-period 
log-return at time ¢ is given by 


h-1 
(hy _ St ) ae ( Sı At) E 
X; =ln|—— ])=ln ee = Xj; (3.3) 
i (5 Sr-1 St—h dX, N 


t—h 


and from our original sample we can form a sample of non-overlapping h-period 
returns x”. t = h,2h,...,|n/h]h}, where |-] denotes the integer part or floor 
function; |x] = max{k € Z: k < x} is the largest integer not greater than x. 

Due to the sum structure of the h-period returns, it is to be expected that some 
central limit effect takes place, whereby their distribution becomes less leptokurtic 
and more normal as h is increased. Note that, although we have cast doubt on the 
iid model for daily data, a central limit theorem also applies to many stationary 
time-series processes, including the GARCH models that are a focus of Chapter 4. 

In Table 3.1 the Ljung—Box tests of randomness have also been applied to non- 
overlapping monthly return data. For twenty out of thirty stocks the null hypothesis 
of iid data is not rejected at the 5% level in Ljung—Box tests applied to both the raw 
and absolute returns. It is therefore harder to find evidence of serial dependence in 
such monthly returns. 

Aggregating data to form non-overlapping h-period returns reduces the sample 
size from n to |n/h], and for longer-period returns (such as quarterly or yearly 
returns) this may be a very serious reduction in the amount of data. An alternative 
in this case is to form overlapping returns. For 1 < k < h we can form overlapping 
returns by taking 


(XM: t=h,h+k,h+2k,...,h+|(n— h)/k]k}, (3.4) 


which yields 1 + | (n — h)/k] values that overlap by an amount h — k. In forming 
overlapping returns we can preserve a large number of data points, but we do build 
additional serial dependence into the data. Even if the original data were iid, over- 
lapping data would be profoundly dependent, which can greatly complicate their 
analysis. 
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Notes and Comments 


A number of texts contain extensive empirical analyses of financial return series and 
discussions of their properties. We mention in particular Taylor (2008), Alexander 
(2001), Tsay (2002) and Zivot and Wang (2003). For more discussion of the random- 
walk hypothesis for stock returns, and its shortcomings, see Lo and MacKinlay 
(1999). 

There are countless possible tests of univariate normality and a good starting point 
is the entry on “departures from normality, tests for” in Volume 2 of the Encyclopedia 
of Statistics (Kotz, Johnson and Read 1985). For an introduction to Q-Q plots see 
Rice (1995, pp. 353-357); for the widely applied Jarque—Bera test based on the 
sample skewness and kurtosis, see Jarque and Bera (1987). 


3.2 Multivariate Stylized Facts 


In risk-management applications we are usually interested in multiple series of 
financial risk-factor changes. To the stylized facts identified in Section 3.1 we may 
add a number of stylized facts of a multivariate nature. 

We now consider multivariate return data X1, ..., X,. Each component series 
X1,j,.--,Xn,j for j = 1,...,d is a series formed by logarithmic differencing of 
a daily price, index or exchange-rate series as before. Commonly observed multi- 
variate stylized facts include the following. 


(M1) Multivariate return series show little evidence of cross-correlation, except for 
contemporaneous returns. 


(M2) Multivariate series of absolute returns show profound evidence of cross- 
correlation. 


(M3) Correlations between series (i.e. between contemporaneous returns) vary over 
time. 


(M4) Extreme returns in one series often coincide with extreme returns in several 
other series. 


3.2.1 Correlation between Series 


The first two observations are fairly obvious extensions of univariate stylized facts (1) 
and (2) from Section 3.1. Just as the stock returns for, say, Microsoft on days t and 
t +h (for h > 0) show very little serial correlation, so we generally detect very little 
correlation between the Microsoft return on day ¢ and, say, the Coca-Cola return 
on day t + h. Of course, stock returns on the same day may show non-negligible 
correlation, due to factors that affect the whole market on that day. When we look 
at absolute returns we should bear in mind that periods of high or low volatility are 
generally common to more than one stock. Returns of large magnitude in one stock 
may therefore tend to be followed on subsequent days by further returns of large 
magnitude for both that stock and other stocks, which can explain (M2). The issue 
of cross-correlation and its estimation is a topic in multivariate time-series analysis 
and is addressed with an example in Section 14.1. 
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Stylized fact (M3) is a multivariate counterpart to univariate observation (4): that 
volatility appears to vary with time. It can be interpreted in a couple of ways with 
reference to different underlying models. On the one hand, we could refer to a model 
in which there are so-called stationary regimes; during these regimes correlations 
are fixed but the regimes change from time to time. On the other hand, we could refer 
to more dynamic models in which a conditional correlation is changing all the time. 
Just as volatility is often formally modelled as the conditional standard deviation 
of returns given historical information, we can also devise models that feature a 
changing conditional correlation given historical information. Examples of such 
models include certain multivariate GARCH models, as discussed in Section 14.2. 
In the context of such models it is possible to demonstrate (M3) for many pairs of 
risk-factor return series. 

However, we should be careful about drawing conclusions about changing cor- 
relations based on more ad hoc analyses. To explain this further we consider two 
data sets. The first, shown in Figure 3.5, comprises the BMW and Siemens daily 
log-return series for the period from 23 January 1985 to 22 September 1994; there 
are precisely 2000 values. 

The second data set, shown in Figure 3.6, consists of an equal quantity of ran- 
domly generated data from a bivariate ¢ distribution. The parameters have been 
chosen by fitting the distribution to the BMW-Siemens data by the method of max- 
imum likelihood. The fitted model is estimated to have 2.8 degrees of freedom and 
estimated correlation 0.72 (see Section 6.2.1 for more details of the multivariate 
t distribution). 

The two data sets show some superficial resemblance. The distribution of values 
is similar in both cases. However, the simulated data are independent and there is 
no serial dependence or volatility clustering. 

We estimate rolling correlation coefficients for both series using amoving window 
of twenty-five days, which is approximately the number of trading days in a typical 
calendar month. These kinds of rolling empirical estimates are quite commonly 
used in practice to gather evidence of how key model parameters may change. In 
Figure 3.7 the resulting estimates are shown for the BMW-Siemens log-return data 
and the iid Student f-distributed data. 

Remarkably, there are no obvious differences between the results for the two 
data sets; if anything, the range of estimated correlation values for the iid data is 
greater, despite the fact that they are generated from a single stationary model with 
a fixed correlation of 0.72. 

This illustrates that simple attempts to demonstrate (M3) using empirical correla- 
tion estimates should be interpreted with care. There is considerable error involved 
in estimating correlations from small samples, particularly when the underlying dis- 
tribution is a heavier-tailed bivariate distribution, such as af distribution, rather than 
a Gaussian distribution (see also Example 6.30 in this context). The most reliable 
way to substantiate (M3) and to decide in exactly what way correlation changes is 
to fit different models for changing correlation and then to make formal statistical 
comparisons of the models. 
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Figure 3.5. (a) BMW and (b) Siemens log-return data for the period from 23 January 1985 
to 22 September 1994 together with (c) pairwise scatterplot. Three extreme days on which 
large negative returns occurred are marked. The dates are 19 October 1987, 16 October 1989 
and 19 August 1991 (see Section 3.2.2 for historical commentary). 


3.2.2 Tail Dependence 


Stylized fact (M4) is often apparent when time series are compared. Consider again 
the BMW and Siemens log-returns in Figure 3.5. In both the time-series plots and 
the scatterplot, three days have been indicated with a number. These are days on 
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Figure 3.6. Two-thousand iid data points generated from a bivariate ¢ distribution: (a) time 
series of components and (b) pairwise scatterplot. The parameters have been set by fitting the 
t distribution to the BMW-Siemens data in Figure 3.5. 


which large negative returns were observed for both stocks, and all three occurred 
during periods of volatility on the German market. They are, respectively, 19 October 
1987, Black Monday on Wall Street; 16 October 1989, when over 100 000 Germans 
protested against the East German regime in Leipzig during the chain of events that 
led to the fall of the Berlin Wall and German reunification; and 19 August 1991, the 
day of the coup by communist hardliners during the reforming era of Gorbachev in 
the USSR. Clearly, these are days on which momentous events led to joint extreme 
values. 
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Figure 3.7. Twenty-five-day rolling correlation estimates for the empirical data of 
Figure 3.5 (top panel) and for the simulated iid data of Figure 3.6 (bottom panel). 


Related to stylized fact (M4) is the idea that “correlations go to one in times of 
market stress”. The three extreme days in Figure 3.5 correspond to points that are 
close to the diagonal of the scatterplot in the lower-left-hand corner, and it is easy 
to see why one might describe these as occasions on which correlations tend to one. 
It is quite difficult to formally test the hypothesis that model correlations are higher 
when volatilities are higher, and this should be done in the context of a multivariate 
time-series model incorporating either dynamic conditional correlations or regime 
changes. Once again, we should be cautious about interpreting simple analyses based 
on empirical correlation estimates, as we now show. 

In Figure 3.8 we perform an analysis in which we split the 2000 bivariate return 
observations into eighty non-overlapping twenty-five-day blocks. In each block we 
estimate the empirical correlation between the return series and the volatility of the 
two series. We plot the Fisher transform of the estimated correlation against the 
estimated volatility of the BMW series and then regress the former on the latter. 
There is a strongly significant regression relationship (shown by the line) between 
the correlation and volatility estimates. It is tempting to say that in stress periods 
where volatility is high, correlation is also high. The Fisher transform is a well- 
known variance-stabilizing transform that is appropriate when correlation is the 
dependent variable in a regression analysis. 

However, when exactly the same exercise is carried out for the data generated from 
a t distribution (Figure 3.6 (b)), the result is similar. In this case the observation that 
estimated correlation is higher in periods of higher estimated volatility is a pure 
artefact of estimation error for both quantities, since the true underlying correlation 
is fixed. 
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Figure 3.8. Fisher transforms of estimated correlations plotted against estimated volatili- 
ties for eighty non-overlapping blocks of twenty-five observations: (a) the BMW-Siemens 
log-return data in Figure 3.5; and (b) the simulated bivariate ¢ data in Figure 3.6. In both 


cases there is a significant regression relationship between estimated correlations and esti- 
mated volatilities. 


This example is not designed to argue against the view that correlations are higher 
when volatilities are higher; it is simply meant to show that it is difficult to demon- 
strate this using an ad hoc approach based on estimated correlations. Formal com- 
parison of different multivariate volatility and correlation models with differing 
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specifications for correlation is required; some models of this kind are described in 
Section 14.2. Moreover, while it may be partly true that useful multivariate time- 
series models for returns should have the property that conditional correlations tend 
to become large when volatilities are large, the phenomenon of simultaneous extreme 
values can also be addressed in other ways. 

For example, we can choose distributions in multivariate models that have so- 
called tail dependence or extremal dependence. Loosely speaking, this means mod- 
els in which the conditional probability of observing an extreme value for one risk 
factor given that we have observed an extreme value for another is non-negligible 
and, indeed, is in some cases quite large. A mathematical definition of this notion 
and a discussion of its importance may be found in Section 7.2.4. 


Notes and Comments 


Pitfalls in tests for changing correlations are addressed in an interesting paper 
by Boyer, Gibson and Loretan (1999), which argues against simplistic empirical 
analyses based on segmenting the data into normal and stressed regimes. See also 
Loretan and English (2000) for a discussion of correlation breakdowns during peri- 
ods of market instability. An interesting book on the importance of correlation in 
risk management is Engle (2009). 

Tail dependence has various definitions: see Joe (1997) and Coles, Heffernan and 
Tawn (1999). The importance of tail dependence in risk management was highlighted 
in Embrechts, McNeil and Straumann (1999), Embrechts, McNeil and Straumann 
(2002) and Mashal, Naldi and Zeevi (2003). It is now a recognized issue in the reg- 
ulatory literature: see, for example, the discussion of tail correlation in the CEIOPS 
consultation paper on the use of correlation in the standard formula for the solvency 
capital requirement (CEIOPS 2009). 


Part II 


Methodology 


4 


Financial Time Series 


Motivated by the discussion of the empirical properties of financial risk-factor 
change data in Chapter 3, in this chapter we present univariate time-series mod- 
els that mimic the properties of real return data. 

In Section 4.1 we review essential concepts in the analysis of time series, such as 
stationarity, autocorrelations and their estimation, white noise processes, and ARMA 
(autoregressive moving-average) processes. We then devote Section 4.2 to univari- 
ate ARCH and GARCH (generalized autoregressive conditionally heteroscedastic) 
processes for capturing the important phenomenon of volatility. 

GARCH models are certainly not the only models for describing the volatility of 
financial returns. Other important classes of model include discrete-time stochastic 
volatility models, long-memory GARCH models, continuous-time models fitted to 
discrete data, and models based on realized volatility calculated from high-frequency 
data; these alternative approaches are not handled in this book. 

Our emphasis on GARCH has two main motivations, the first being a practical one. 
We recall that in risk management we are typically dealing with very large numbers 
of risk factors, and our philosophy, expounded in Section 1.5, is that broad-brush 
techniques that capture the main risk features of many time series are more important 
than very detailed analyses of single series. The GARCH model lends itself to 
this approach and proves relatively easy to fit. There are also some multivariate 
extensions (see Chapter 14) that build in simple ways on the univariate models and 
that may be calibrated to a multivariate series in stages. This ease of use contrasts 
with other models where the fitting of a single series often presents a computational 
challenge (e.g. estimation of a stochastic volatility model via filtering or Gibbs 
sampling), and multivariate extensions have not been widely considered. Moreover, 
an average financial enterprise will typically collect daily data on its complete set 
of risk factors for the purposes of risk management, and this rules out some more 
sophisticated models that require higher-frequency data. 

Our second reason for concentrating on ARCH and GARCH models is didac- 
tic. These models for volatile return series have a status akin to ARMA models in 
classical time series; they belong, in our opinion, to the body of standard method- 
ology to which a student of the subject should be exposed. A quantitative risk man- 
ager who understands GARCH has a good basis for understanding more complex 
models and a good framework for talking about historical volatility in a rational 
way. He/she may also appreciate more clearly the role of more ad hoc volatility 
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estimation methods such as the exponentially weighted moving-average (EWMA) 
procedure. 


4.1 Fundamentals of Time Series Analysis 


This section provides a short summary of the essentials of classical univariate time- 
series analysis with a focus on that which is relevant for modelling risk-factor return 
series. We have based the presentation on Brockwell and Davis (1991, 2002), so 
these texts may be used as supplementary reading. 


4.1.1 Basic Definitions 


A time-series model for a single risk factor is a discrete-time stochastic process 
(X1)+eZ, 1.e. a family of rvs, indexed by the integers and defined on some probability 
space (2, F, P). 


Moments of a time series. Assuming they exist, we define the mean function u(t) 
and the autocovariance function y (t, s) of (Xt);ez by 


M(t) = E(X;), teZ, 
y(t, 8) = E(X; — uO) (Xs — w(s)), t,s EZ. 


It follows that the autocovariance function satisfies y (t, s) = y (s, t) for all t, s, and 
y(t, t) = var(X;). 


Stationarity. Generally, the processes we consider will be stationary in one or both 
of the following two senses. 


Definition 4.1 (strict stationarity). The time series (X;);<7z is strictly stationary if 


d 
(Xn, ones X) = (Xn 4k, ee) Xt,4+k) 
for all t),...,t%,k € Zand for all n EN. 


Definition 4.2 (covariance stationarity). The time series (X;)rez is covariance 
stationary (or weakly or second-order stationary) if the first two moments exist and 
satisfy 


w(t) =p, te Z, 
y(t,s)=y(t+k,s+k), t,s,k eZ. 


Both these definitions attempt to formalize the notion that the behaviour of a time 
series is similar in any epoch in which we might observe it. Systematic changes in 
mean, variance or the covariances between equally spaced observations are incon- 
sistent with stationarity. 

It may be easily verified that a strictly stationary time series with finite variance 
is covariance stationary, but it is important to note that we may define infinite- 
variance processes (including certain ARCH and GARCH processes) that are strictly 
stationary but not covariance stationary. 
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Autocorrelation in stationary time series. The definition of covariance stationarity 
implies that for all s, t we have y(t — s, 0) = y(t, s) = y (s, t) = y (s — t, 0), so 
that the covariance between X; and Xs only depends on their temporal separation 
|s — t|, which is known as the lag. Thus, for a covariance-stationary process we 
write the autocovariance function as a function of one variable: 


yth) :=y(h,0), Whe Z. 


Noting that y (0) = var(X;), Vt, we can now define the autocorrelation function of 
a covariance-stationary process. 


Definition 4.3 (autocorrelation function). The autocorrelation function (ACF) 
ph) of a covariance-stationary process (X+)rez is 


p(h) = (Xn, Xo) = y (h)/y 0),  Vh eZ. 


We speak of the autocorrelation or serial correlation p(h) at lag h. In classical 
time-series analysis the set of serial correlations and their empirical analogues esti- 
mated from data are the objects of principal interest. The study of autocorrelations 
is known as analysis in the time domain. 


White noise processes. The basic building blocks for creating useful time-series 
models are stationary processes without serial correlation, known as white noise 
processes and defined as follows. 


Definition 4.4 (white noise). (X;);<z is a white noise process if it is covariance 
stationary with autocorrelation function 


1, h=0, 
p(h) = 
0, h#0. 


A white noise process centred to have mean 0 with variance o? = var(X;) will 
be denoted WN(0, o”). A simple example of a white noise process is a series of iid 
rvs with finite variance, and this is known as a strict white noise process. 


Definition 4.5 (strict white noise). (X;),<z is a strict white noise process if it is a 
series of iid, finite-variance rvs. 


A strict white noise (SWN) process centred to have mean 0 and variance o? 


will be denoted SWN (0, o7). Although SWN is the easiest kind of noise process to 
understand, it is not the only noise that we will use. We will later see that covariance- 
stationary ARCH and GARCH processes are in fact white noise processes. 


Martingale difference. One further noise concept that we use, particularly when we 
come to discuss volatility and GARCH processes, is that of a martingale-difference 
sequence. We recall that a martingale is a sequence of integrable rvs (M,) such that 
the expected value of M, given the previous history of the sequence is M,_,. This 
implies that if we define (X;) by taking first differences of the sequence (M;), then 
the expected value of X; given information about previous values is 0. We have 
observed in Section 3.1 that this property may be appropriate for financial return 
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data. A martingale difference is often said to model our winnings in consecutive 
rounds of a fair game. 

To discuss this concept more precisely, we assume that the time series (X;)rez 
is adapted to some filtration (F;);<z, that represents the accrual of information 
over time. The sigma algebra F, represents the available information at time t, and 
typically this will be the information contained in past and present values of the time 
series itself (Xs)s<r, which we refer to as the history up to time żt and denote by 
F, = o ({Xs: s < t}); the corresponding filtration is known as the natural filtration. 


Definition 4.6 (martingale difference). The time series (X;),¢z is known as a 
martingale-difference sequence with respect to the filtration (¥;),¢7 if E|X1| < ©, 
X; is F;-measurable (adapted) and 


E(X: | Fi-1) =0, VteZ. 
Obviously the unconditional mean of such a process is also zero: 
E(X;) = E(E(X; | Fi-1)) = 0, Vt € Z. 
Moreover, if E(X 2 < œ for all ż, then autocovariances satisfy 
yt, s) = E(X;Xs) 

_ J ECE(KXs | Fs-1)) = E(X1E(Xs | Fs-1)) = 0, t<s, 

O [EEK Xs | F1) = EXE(X; | F1) =0, t>s. 
Thus a finite-variance martingale-difference sequence has zero mean and zero 
covariance. If the variance is constant for all ¢, it is a white noise process. 
4.1.2 ARMA Processes 


The family of classical ARMA processes are widely used in many traditional appli- 
cations of time-series analysis. They are covariance-stationary processes that are 
constructed using white noise as a basic building block. As a general notational 
convention in this section and the remainder of the chapter we will denote white 
noise by (€;);¢z, and strict white noise by (Zr)rez. 


Definition 4.7 (ARMA process). Let (€;);<7 be WN(0, o2). The process (X; )rez is 
a zero-mean ARMA (p, q) process if it is a covariance-stationary process satisfying 
difference equations of the form 


Xt — Q1 Xt-1 — -++ — ỌpXt-p = £t + 01Er-1 +++ + OgEr—g, Vt EZ. (4.1) 


(X+) is an ARMA process with mean u if the centred series (X; — u)rez is a zero- 
mean ARMA (p, q) process. 


Note that, according to our definition, there is no such thing as a non-covariance- 
stationary ARMA process. Whether the process is strictly stationary or not will 
depend on the exact nature of the driving white noise, also known as the process 
of innovations. If the innovations are iid, or themselves form a strictly stationary 
process, then the ARMA process will also be strictly stationary. 
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For all practical purposes we can restrict our study of ARMA processes to causal 
ARMA processes. By this we mean processes satisfying the equations (4.1), which 
have a representation of the form 


[0,0] 
X=} Vieni, (4.2) 
E 
where the y; are coefficients that must satisfy 
OO 
XC |Wil < 00. (4.3) 
i=0 


Remark 4.8. The so-called absolute summability condition (4.3) is a technical 
condition that ensures that E|X;| < oo. This guarantees that the infinite sum 
in (4.2) converges absolutely, almost surely, meaning that both er |W; |ley_;| and 
yo WiEr—i are finite with probability 1 (see Brockwell and Davis 1991, Proposi- 
tion 3.1.1). 


We now verify by direct calculation that causal ARMA processes are indeed 
covariance stationary and calculate the form of their autocorrelation function, before 
going on to look at some simple standard examples. 


Proposition 4.9. Any process satisfying (4.2) and (4.3) is covariance stationary, 

with an autocorrelation function given by 

Dreo Vi Wit in| 
Lino VF 


Proof. Obviously, for all t we have E(X;) = 0 and var(X;) = o2 er y? < ©, 
due to (4.3). Moreover, the autocovariances are given by 


p(h) = heZ. (4.4) 


[0.0] OO 
cov(X;, Xt+h) = E(X: Xt+r) = (X Wiet—i 5 venis). 
i=0 j=0 


Since (€+) is white noise, it follows that E(€;-j€:4n-j) # 0 4> j = i + h, and 
hence that 


(oe) 
v(h) = cov(X;, Xir) =o, È Wivian, hez, (4.5) 
i=0 


which depends only on the lag h and not on t. The autocorrelation function follows 
easily. 


Example 4.10 (MA(q) process). It is clear that a pure moving-average process 


4 
Xp =} Geri +£ (4.6) 
i=l 
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forms a simple example of a causal process of the form (4.2). It is easily inferred 
from (4.4) that the autocorrelation function is given by 


q-|h| 
Vigo F%9i+1n1 
q 2 
i=0 6; 


p(h)= » lhl € {0,1,...,9}, 

where 69 = 1. For |h| > q we have p(h) = 0, and the autocorrelation function is 
said to cut off at lag q. If this feature is observed in the estimated autocorrelations 
of empirical data, it is often taken as an indicator of moving-average behaviour. A 
realization of an MA(4) process together with the theoretical form of its ACF is 
shown in Figure 4.1. 


Example 4.11 (AR(1) process). The first-order AR process satisfies the set of 
difference equations 


Xt = QX- +E, Vt. (4.7) 


This process is causal if and only if |ġ1| < 1, and this may be understood intuitively 
by iterating the equation (4.7) to get 


Xı = p1 (1 X12 + €t-1) + €r-2 


k 
k+1 i 
= or Xt—k-1 + X gjeni. 
i=0 
Using more careful probabilistic arguments it may be shown that the condition 
|ġı]| < 1 ensures that the first term disappears as k — oo and the second term 
converges. The process 


OO 
X, =) pieni (4.8) 
i=0 


turns out to be the unique solution of the defining equations (4.7). It may be easily 
verified that this is a process of the form (4.2) and that year if’ = A—|¢1|)~! so 
that (4.3) is satisfied. Looking at the form of the solution (4.8), we see that the AR(1) 
process can be represented as an MA (c0) process: an infinite-order moving-average 
process. 

The autocovariance and autocorrelation functions of the process may be calculated 
from (4.5) and (4.4) to be 


pila? ihl 
y (h) i z p(h)=¢; , EZ 


Thus the ACF is exponentially decaying with possibly alternating sign. A realization 
of an AR(1) process together with the theoretical form of its ACF is shown in 
Figure 4.1. 
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Figure 4.1. A number of simulated ARMA processes with their autocorrelation func- 
tions (dashed) and correlograms. Innovations are Gaussian. (a) AR(1), ¢; = 0.8. (b) MA(4), 
6, = —0.8, 0.4, 0.2, —0.3. (c) ARMA (1, 1), 1 = 0.6, 0; = 0.5. 


Remarks on general ARMA theory. In the case of the general ARMA process of 
Definition 4.7, the issue of whether this process has a causal representation of the 
form (4.2) is resolved by the study of two polynomials in the complex plane, which 
are given in terms of the ARMA model parameters by 


$z) =1- iz- — pz”, 
A(z) = 1+ Oz + + 0g. 


Provided that (z) and 6 (z) have no common roots, then the ARMA process is a 
causal process satisfying (4.2) and (4.3) if and only if $(z) has no roots in the unit 
circle |z| < 1. The coefficients y; in the representation (4.2) are determined by the 
equation 


2 e2) 
Z=——, <1. 
2hr zo " 
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Example 4.12 (ARMA(1, 1) process). For the process given by 
— @Xi-1 = & + O&-1, Vt € Z, 


the complex polynomials are $z) = ] — ġız and õ(z) = 1] +0ız, and these have no 
common roots provided ġı + 01 4 0. The solution of o(z) =0isz=1 /¢) and this 
is outside the unit circle provided |¢;| < 1, so that this is the condition for causality 
(as in the AR(1) model of Example 4.11). 

The representation (4.2) can be obtained by considering 


1 a 
Dv! = = = OHO + Oz t Oz +e), LST, 
and is easily calculated to be 
oe . 
Xi = £+ (Q1 +61) i eri. (4.9) 
i=1 
Using (4.4) we may calculate that for h 4 0 the ACF is 
of!" +010 + 101) 
1+ 6? + 26161 


A realization of an ARMA(1, 1) process together with the theoretical form of its 
ACF is shown in Figure 4.1. 


p(h) = 


Invertibility. Equation (4.9) shows how the ARMA (1, 1) process may be thought 
of as an MA (c0) process. In fact, if we impose the condition |01| < 1, we can also 
express (X;) as the AR (c0) process given by 
[0.0] 
Xi =e + (Q1 +01) X 0) Xi. (4.10) 
i=1 
If we rearrange this to be an equation for £+, then we see that we can, in a sense, 
“reconstruct” the latest innovation €; from the entire history of the process (Xs)s<r. 
The condition |01| < 1 is known as an invertibility condition, and for the general 
ARMA(p, q) process the invertibility condition is that (z) should have no roots 
in the unit circle |z| < 1. In practice, the models we fit to real data will be both 
invertible and causal solutions of the ARMA-defining equations. 


Models for the conditional mean. Consider a general invertible ARMA model with 
non-zero mean. For what comes later it will be useful to observe that we can write 
such models as 


p q 
X= m+n Mr = Ut DY GXi) Ojej. (4.11) 
i=1 fel 
Since we have assumed invertibility, the terms ¢;_;, and hence uz, can be written in 
terms of the infinite past of the process up to time t — 1; us is said to be measurable 
with respect to F;_1 =o ({Xs5: s <t — 1). 
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If we make the assumption that the white noise (€;);<z is a martingale-difference 
sequence (see Definition 4.6) with respect to (F;);ez, then E(X; | Fr-1) = wr. 
In other words, such an ARMA process can be thought of as putting a particular 
structure on the conditional mean ju; of the process. ARCH and GARCH processes 
will later be seen to put structure on the conditional variance var(X; | F;—1). 


ARIMA models. In traditional time-series analysis we often consider an even larger 
class of model known as ARIMA models, or autoregressive integrated moving- 
average models. Let V denote the difference operator, so that for a time-series 
process (Y;);cz we have VY; = Y, — Y;-1. Denote repeated differencing by vi, 
where 

VY,, d=1, 


V1Y, = 
‘yr ivy yeti — Y,a), d>. 


(4.12) 


The time series (Y;) is said to be an ARIMA (p, d, q) process if the differenced 
series (X;) given by X; = V“Y, is an ARMA(p, q) process. For d > 1, ARIMA 
processes are non-stationary processes. They are popular in practice because the 
operation of differencing (once or more than once) can turn a data set that is obviously 
“non-stationary” into a data set that might plausibly be modelled by a stationary 
ARMA process. For example, if we use an ARMA(p, q) process to model daily 
log-returns of some price series (S;), then we are really saying that the original 
logarithmic price series (In S;) follows an ARIMA(p, |, q) model. 

When the word integrated is used in the context of time series it generally implies 
that we are looking at a non-stationary process that might be made stationary by 
differencing; see also the discussion of IGARCH models in Section 4.2.2. 


4.1.3 Analysis in the Time Domain 


We now assume that we have a sample X1, ..., Xn from a covariance-stationary 
time-series model (X;);<-z. Analysis in the time domain involves calculating empir- 
ical estimates of autocovariances and autocorrelations from this random sample and 
using these estimates to make inferences about the serial dependence structure of 
the underlying process. 


Correlogram. The sample autocovariances are calculated according to 
I n—h 
Ph) =Y Xan- XX- X), OSA <n, 
n 
t=1 
where X = >", Xz/n is the sample mean, which estimates jx, the mean of the 
time series. From these we calculate the sample ACF: 


ph) = y(h)/PO), O<h<n. 


The correlogram is the plot {(h, 6(h)): h = 0, 1, 2, ...}, which is designed to facili- 
tate the interpretation of the sample ACF. Correlograms for various simulated ARMA 
processes are shown in Figure 4.1; note that the estimated correlations correspond 
reasonably closely to the theoretical ACF for these particular realizations. 
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To interpret such estimators of serial correlation, we need to know something 
about their behaviour for particular time series. The following general result is for 
causal linear processes, which are processes of the form (4.2) driven by strict white 
noise. 


Theorem 4.13. Let (X;),<z, be the linear process 


CO [0,0] 
Xı= u=} WiZ-i1, where X || Wil < 00, (Zi)ez ~ SWNO, 07). 
i=0 i=0 


Suppose that either E(Z}) <ooor are iw? < oo. Then, forh € {1,2,...}, we 
have è 
Jn(p(h) — ph)) > Na (0, W), 


where 


Êh) = (ÔC), --., ÔCh)Y', 
p(h) = (p(1),..., p(h)Y', 
Nn denotes an h-dimensional normal distribution (see Section 6.1.3), 0 is the h- 


dimensional vector of zeros, and W is a covariance matrix with elements 


ee) 


Wij = Yok +i) + p(k —i) — 2p(i)p(k)) (p(k + J) + p(k — j) — 2p (p (k)). 
k=1 


Proof. This follows as a special case of a result in Brockwell and Davis (1991, 
pp. 221-223). 


The condition )°7°9 i y? < oo holds for ARMA processes, so ARMA processes 
driven by SWN fall under the scope of this theorem (regardless of whether fourth 
moments exist for the innovations). 

Trivially, the theorem also applies to SWN itself. For SWN we have 


nôh) + NiO. I), 


where I, denotes the h x h identity matrix, so for sufficiently large n the sample auto- 
correlations of data from an SWN process will behave like iid normal observations 
with mean 0 and variance 1/n. Ninety-five per cent of the estimated correlations 
should lie in the interval (—1.96/,/n, 1.96/,/n), and it is for this reason that cor- 
relograms are drawn with confidence bands at these values. If more than 5% of 
estimated correlations lie outside these bounds, then this is considered as evidence 
against the null hypothesis that the data are strict white noise. 


Remark 4.14. In light of the discussion of the asymptotic behaviour of sample 
autocorrelations for SWN, it might be asked how these estimators behave for white 
noise. However, this is an extremely general question because white noise encom- 
passes a variety of possible underlying processes (including the standard ARCH 
and GARCH processes we later address) that only share second-order properties 
(finiteness of variance and lack of serial correlation). In some cases the standard 
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Gaussian confidence bands apply; in some cases they do not. Fora GARCH process 
the critical issue turns out to be the heaviness of the tail of the stationary distribution 
(see Mikosch and Starica (2000) for more details). 


Portmanteau tests. Itis often useful to combine the visual analysis of the correlo- 
gram with a formal numerical test of the strict white noise hypothesis, and a popular 
test is that of Ljung and Box, as applied in Section 3.1.1. Under the null hypothesis 
of SWN, the statistic 

ÊG)? 

n= j 


Og = n(n + 2) 5 

j=l 

has an asymptotic chi-squared distribution with h degrees of freedom. This statistic 

is generally preferred to the unpler Box-Pierce statistic Qgp = n5’ =i bi)’. 

which also has an asymptotic Xp distribution under the null hypothesis, although 

the chi-squared approximation may not be so good in smaller samples. These tests 
are the most commonly applied portmanteau tests. 

If a series of rvs forms an SWN process, then the series of absolute or squared 
variables must also be iid. It is a good idea to also apply the correlogram and Ljung— 
Box tests to absolute values as a further test of the SWN hypothesis. We prefer to 
perform tests of the SWN hypothesis on the absolute values rather than the squared 
values because the squared series is only an SWN (according to the definition we 
use) when the underlying series has a finite fourth moment. Daily log-return data 
often point to models with an infinite fourth moment. 


4.1.4 Statistical Analysis of Time Series 


In practice, the statistical analysis of time-series data X1, ..., Xn, follows a pro- 
gramme consisting of the following stages. 


Preliminary analysis. The data are plotted and the plausibility of a single stationary 
model is considered. There are also a number of formal numerical tests of stationarity 
that may be carried out at this point (see Notes and Comments for details). 

Since we concentrate here on differenced logarithmic value series, we will assume 
that at most minor preliminary manipulation of our data is required. Classical time- 
series analysis has many techniques for removing trends and seasonalities from 
“non-stationary” data; these techniques are discussed in all standard texts, including 
Brockwell and Davis (2002) and Chatfield (2003). While certain kinds of financial 
time series, such as earnings time series, certainly do show seasonal patterns, we 
will assume that such effects are relatively minor in the kinds of daily or weekly 
return series that are the basis of risk-management methods. If we were to base our 
risk management on high-frequency data, preliminary cleaning would be more of 
an issue, since these show clear diurnal cycles and other deterministic features (see 
Dacorogna et al. 2001). 

Obviously, the assumption of stationarity becomes more questionable if we take 
long data windows, or if we choose windows in which well-known economic policy 
shifts have taken place. Although the markets change constantly there will always be 
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a tension between our desire to use the most up-to-date data and our need to include 
enough data to have precision in statistical estimation. Whether half a year of data, 
one year, five years or ten years are appropriate will depend on the situation. It is 
certainly a good idea to perform a number of analyses with different data windows 
and to investigate the sensitivity of statistical inference to the amount of data. 


Analysis in the time domain. Having settled on the data, the techniques of Sec- 
tion 4.1.3 come into play. By applying correlograms and portmanteau tests such as 
Ljung—Box to both the raw data and their absolute values, the SWN hypothesis is 
evaluated. If it cannot be rejected for the data in question, then the formal time-series 
analysis is over and simple distributional fitting could be used instead of dynamic 
modelling. 

For daily risk-factor return series we expect to quickly reject the SWN hypothesis. 
Despite the fact that correlograms of the raw data may show little evidence of serial 
correlation, correlograms of the absolute data are likely to show evidence of strong 
serial dependence. In other words, the data may support a white noise model but not 
a strict white noise model. In this case, ARMA modelling is not required, but the 
volatility models of Section 4.2 may be useful. 

If the correlogram does provide evidence of the kind of serial correlation patterns 
produced by ARMA processes, then we can attempt to fit ARMA processes to data. 


Model fitting. A traditional approach to model fitting first attempts to identify the 
order of a suitable ARMA process using the correlogram and a further tool known 
as the partial correlogram (not described in this book but found in all standard texts). 
For example, the presence of a cut-off at lag q in the correlogram (see Example 4.10) 
is taken as a diagnostic for pure moving-average behaviour of order q (and simi- 
lar behaviour in a partial correlogram indicates pure AR behaviour). With modern 
computing power it is now quite easy to simply fit a variety of MA, AR and ARMA 
models and to use a model-selection criterion like that of Akaike (described in Sec- 
tion A.3.6) to choose the “best” model. There are also automated model choice 
procedures such as the method of Tsay and Tiao (1984). 

Sometimes there are a priori reasons for expecting certain kinds of model to 
be most appropriate. For example, suppose we analyse longer-period returns that 
overlap, as in (3.4). Consider the case where the raw data are daily returns and we 
build weekly returns. In (3.4) we set h = 5 (to get weekly returns) and k = | (to get 
as much data as possible). Assuming that the underlying data are genuinely from a 
white noise process (X;);cz ~ WN(0, o”), the weekly aggregated returns at times t 
and t + l satisfy 


4 4 2 
(5— Do4, 1=0,...,4, 
cov(X®, xe) = COV ÈR Xt-is 5 Xims) = 
i=0 j=0 0, l > 5, 


so that the overlapping returns have the correlation structure of an MA (4) process, 
and this would be a natural choice of time-series model for them. 

Having chosen the model to fit, there are a number of possible fitting methods, 
including specialized methods for AR processes, such as Yule-Walker, that make 
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minimal assumptions concerning the distribution of the white noise innovations; 
we refer to the standard time-series literature for more details. In Section 4.2.4 we 
discuss the method of (conditional) maximum likelihood, which may be used to fit 
ARMA models with (or without) GARCH errors to data. 


Residual analysis and model comparison. Recall the representation of a causal 
and invertible ARMA process in (4.11), and suppose we have fitted such a process 
and estimated the parameters ¢; and 0j. The residuals are inferred values ê, for the 
unobserved innovations £; and they are calculated recursively from the data and the 
fitted model using the equations 


P q 
= Xi— ûn =A) AX Â) t Y Âj- (4.13) 
i=l j=l 


where the values Ô, are sometimes known as the fitted values. Obviously, we have a 
problem calculating the first few values of ê; due to the finiteness of our data sample 
and the infinite nature of the recursions (4.13). One of many possible solutions is to 
set €g41 = ê—q+2 = --.- = ĉo = 0 and X-p+1 = X-pp2 = ` = X0 = X and 
then to use (4.13) for t = 1, ..., n. Since the first few values will be influenced by 
these starting values, they might be ignored in later analyses. 

The residuals (€;) should behave like a realization of a white noise process, 
since this is our model assumption for the innovations, and this can be assessed by 
constructing their correlogram. If there is still evidence of serial correlation in the 
correlogram, then this suggests that a good ARMA model has not yet been found. 
Moreover, we can use portmanteau tests to test formally that the residuals behave 
like a realization of a strict white noise process. If the residuals behave like SWN, 
then no further time-series modelling is required; if they behave like WN but not 
SWN, then the volatility models of Section 4.2 may be required. 

It is usually possible to find more than one reasonable ARMA model for the data, 
and formal model-comparison techniques may be required to decide on an overall 
best model or models. The Akaike information criterion described in Section A.3.6 
might be used, or one of a number of variants on this criterion that are often preferred 
for time series (see Brockwell and Davis 2002, Section 5.5.2). 


4.1.5 Prediction 


There are many approaches to the forecasting or prediction of time series, and we 
summarize two that extend easily to the case of GARCH models. The first strategy 
makes use of fitted ARMA (or ARIMA) models and is sometimes called the Box— 
Jenkins approach (Box and Jenkins 1970). The second strategy is a model-free 
approach to forecasting known as exponential smoothing, which is related to the 
exponentially weighted moving-average technique for predicting volatility. 


Prediction using ARMA models. Consider the invertible ARMA model and its 
representation in (4.11). Let F; denote the history of the process up to and including 
time f, as before, and assume that the innovations (¢€;);-z have the martingale- 
difference property with respect to (F;);ez. 
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For the prediction problem it will be convenient to denote our sample of n data by 
Xt—n+1, +--+, Xt. We assume that these are realizations of rvs following a particular 
ARMA model. Our aim is to predict X;+1 or, more generally, X;+n, and we denote 
our prediction by P;X;,;. The method we describe assumes that we have access 
to the infinite history of the process up to time ¢ and derives a formula that is then 
approximated for our finite sample. 

As a predictor of X;4» we use the conditional expectation E(X;+, | ¥;). Among 
all predictions P;X;+, based on the infinite history of the process up to time t, this 
predictor minimizes the mean squared prediction error E((X;+ — P;X1+n)7). 

The basic idea is that, for h > 1, the prediction E(X;+; | F;) is recursively 
evaluated in terms of E(X++4n~-1 | Ft). We use the fact that E (er+n | Ft) = O (the 
martingale-difference property of innovations) and that the rvs (Xs)s<: and (€5)s<z 
are “known” at time t. The assumption of invertibility (4.10) ensures that the inno- 
vation €; can be written as a function of the infinite history of the process (X5)s<+. 
To illustrate the approach it will suffice to consider an ARMA(1, 1) model, the 
generalization to ARMA(p, q) models following easily. 


Example 4.15 (prediction for the ARMA(1, 1) model). Suppose an ARMA(1, 1) 
model of the form (4.11) has been fitted to the data, and its parameters u, 6; and 01 
have been determined. Our one-step prediction for X;+1 is 


E(Xr41 | Fi) = Meg = U + Oi(kr — u) + 018r, 
since E(€;41 | F+) = 0. For a two-step prediction we get 


E(X142 | Fe) = E (r42 | Fi) = u + Q (E (X1 | F) — u) 
= p +? (Xı — u) + 1818r, 


and in general we have 
E(Xin | F) = M+ OX: = w+ G7 Oer. 


Without knowing all historical values of (Xs )s<; this predictor cannot be evaluated 
exactly, because we do not know e; exactly, but it can be accurately approximated if n 
is reasonably large. The easiest way of doing this is to substitute the model residual ê; 
calculated from (4.13) for £+. Note that limp_, oo E(X14y | Fi) = n, almost surely, 
so that the prediction converges to the estimate of the unconditional mean of the 
process for longer time horizons. 


Exponential smoothing. This is a popular technique that is used for both prediction 
of time-series and trend estimation. Here we do not necessarily assume that the data 
come from a stationary model, although we do assume that there is no deterministic 
seasonal component in the model. In general, the method is less well suited to return 
series with frequently changing signs and is better suited to undifferenced price or 
value series. It forms the basis of a very common method of volatility prediction 
(see Section 4.2.5). 
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Suppose our data represent realizations of rvs Y;-n+41,..., Y;, considered without 
reference to any concrete parametric model. As a forecast for Y;1 we use a prediction 
of the form 

n—1 
P; Y1 = yond — ols omen where 0 <à < 1. 
i=0 
Thus we weight the data from most recent to most distant with a sequence of expo- 
nentially decreasing weights that sum to almost one. It is easily calculated that 


n—-1 n—2 


P Yi = DIMI AM = AN + (1-2) YOM = AY- 
i=0 j=0 
= 4,40 -) Pi, (4.14) 


so that the prediction at time ¢ is obtained from the prediction at time t — | by a 
simple recursive scheme. The choice of A is subjective; the larger the value, the 
more weight is put on the most recent observation. Empirical validation studies 
with different data sets can be used to determine a value of A that gives good results; 
Chatfield (2003) reports that values between 0.1 and 0.3 are commonly used in 
practice. 

Note that, although the method is commonly seen as a model-free forecasting 
technique, it can be shown to be the natural prediction method based on conditional 
expectation for a non-stationary ARIMA (0, 1, 1) model. 


Notes and Comments 


There are many texts covering the subject of classical time-series analysis, including 
Box and Jenkins (1970), Priestley (1981), Abraham and Ledolter (1983), Brockwell 
and Davis (1991, 2002), Hamilton (1994) and Chatfield (2003). Our account of basic 
concepts, ARMA models and analysis in the time domain closely follows Brockwell 
and Davis (1991), which should be consulted for the rigorous background to ideas we 
can only summarize. We have not discussed analysis of time series in the frequency 
domain, which is less common for financial time series; for this subject see, again, 
Brockwell and Davis (1991) or Priestley (1981). 

For more on tests of the strict white noise hypothesis (that is, tests of randomness), 
see Brockwell and Davis (2002). Original references for the Box—Pierce and Ljung- 
Box tests are Box and Pierce (1970) and Ljung and Box (1978). 

There is a large econometrics literature on tests of stationarity and unit-root tests, 
where the latter are effectively tests of the null hypothesis of non-stationary random- 
walk behaviour. Particular examples are the Dickey—Fuller and Phillips—Perron unit- 
root tests (Dickey and Fuller 1979; Phillips and Perron 1988) and the KPSS test of 
stationarity (Kwiatkowski et al. 1992). 

There is a vast literature on forecasting and prediction in linear models. A 
good non-mathematical introduction is found in Chatfield (2003). The approach 
we describe based on the infinite history of the time series is discussed in greater 
detail in Hamilton (1994). Brockwell and Davis (2002) concentrate on exact linear 
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prediction methods for finite samples. A general review of exponential smoothing 
is found in Gardner (1985). 


4.2 GARCH Models for Changing Volatility 


The most important models for daily risk-factor return series are addressed in this 
section. We give definitions of ARCH (autoregressive conditionally heteroscedastic) 
and GARCH (generalized ARCH) models and discuss some of their mathematical 
properties before going on to talk about their use in practice. 


4.2.1 ARCH Processes 


Definition 4.16. Let (Z;);<7 be SWN(0, 1). The process (X+)rez is an ARCH(p) 
process if it is strictly stationary and if it satisfies, for all £ € Z and some strictly 
positive-valued process (0;);<z, the equations 


Xt = Ot Zt, (4.15) 
p 
of = 09+ > X}; (4.16) 
i=1 
where a > Oandaj >0,i = 1,..., p. 


Let F; = o ({Xs: s < t}) again denote the sigma algebra representing the history 
of the process up to time f¢, so that (F;)rez is the natural filtration. The construc- 
tion (4.16) ensures that o; is measurable with respect to F;_1, and the process (0;)+¢7, 
is said to be previsible. This allows us to calculate that, provided E(|X;|) < o, 


E(X; | Fi-1) = Er Z, | Fr-1) = 0 E (Z, | Fi-1) = E(Z;) = 0, (4.17) 


so that the ARCH process has the martingale-difference property with respect to 
(F;) ez. If the process is covariance stationary, itis simply a white noise, as discussed 
in Section 4.1.1. 


Remark 4.17. Note that the independence of Z; and F;—ı that we have assumed 
above follows from the fact that an ARCH process must be causal, i.e. the equa- 
tions (4.15) and (4.16) must have a solution of the form X; = f(Z;, Z;-1,...) 
for some f, so that Z; is independent of previous values of the process. This con- 
trasts with ARMA models, where the equations can have non-causal solutions (see 
Brockwell and Davis 1991, Example 3.1.2). 


If we simply assume that the process is a covariance-stationary white noise (for 
which we will give a condition in Proposition 4.18), then E(X 7) < œ and 


var(X; | Fi—1) = E(0? Z? | F,—1) = of var(Z;) = of. 


Thus the model has the interesting property that its conditional standard deviation 
or, or volatility, is a continually changing function of the previous squared values of 
the process. If one or more of | X;—-1|, ..., |X;—p| are particularly large, then X; is 
effectively drawn from a distribution with large variance, and may itself be large; in 
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Figure 4.2. A simulated ARCH(1) process with Gaussian innovations and parameters 
ao = a, = 0.5: (a) the realization of the process; (b) the realization of the volatility; and 
correlograms of (c) the raw and (d) the squared values. The process is covariance stationary 
with unit variance and a finite fourth moment (since a, < 1 /V3) and the squared values 
follow an AR(1) process. The true form of the ACF of the squared values is represented by 
the dashed line in the correlogram. 


this way the model generates volatility clusters. The name ARCH refers to this struc- 
ture: the model is autoregressive, since X; clearly depends on previous X;_;, and 
conditionally heteroscedastic, since the conditional variance changes continually. 
The distribution of the innovations (Z+)rez can in principle be any zero-mean, 
unit-variance distribution. For statistical fitting purposes we may or may not choose 
to actually specify the distribution, depending on whether we implement a max- 
imum likelihood (ML), quasi-maximum likelihood (QML) or non-parametric fit- 
ting method (see Section 4.2.4). For ML the most common choices are stan- 
dard normal innovations or scaled ¢ innovations. By the latter we mean that 
Zi ~ ti(v,0, (v — 2)/v), in the notation of Example 6.7, so that the variance 
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of the distribution is one. We keep these choices in mind when discussing further 
theoretical properties of ARCH and GARCH models. 


The ARCH(1) model. In the rest of this section we analyse some of the properties 
of the ARCH(1) model. These properties extend to the whole class of ARCH and 
GARCH models but are most easily introduced in the simplest case. A simulated 
realization of an ARCH(1) process with Gaussian innovations and the corresponding 
realization of the volatility process are shown in Figure 4.2. 

Using X? = o7Z? and (4.16) in the case p = 1, we deduce that the squared 
ARCH(1) process satisfies 


X? = aZ? +01 Z? X? 4. (4.18) 


A detailed mathematical analysis of the ARCH(1) model involves the study of 
equation (4.18), which is a stochastic recurrence equation (SRE). Much as for the 
AR(1) model in Example 4.11, we would like to know when this equation has 
stationary solutions expressed in terms of the infinite history of the innovations, 
i.e. solutions of the form x = S (Zn Z-1,...). 

For ARCH models we have to distinguish carefully between solutions that are 
covariance stationary and solutions that are only strictly stationary. It is possible to 
have ARCH(1) models with infinite variance, which obviously cannot be covariance 
stationary. 


Stochastic recurrence relations. The detailed theory required to analyse stochastic 
recurrence relations of the form (4.18) is outside the scope of this book, and we give 
only brief notes to indicate the ideas involved. Our treatment is based on Brandt 
(1986), Mikosch (2003) and Mikosch (2013); see Notes and Comments at the end 
of this section for further references. 

Equation (4.18) is a particular example of a class of recurrence equations of the 
form 

Y, = ArY;-1 + Br, (4.19) 

where (A;);ez and (B;);ez are sequences of iid rvs. Sufficient conditions for a 
solution are that 


E(n* |B;|) <oo and E(In|A;|) <0, (4.20) 


where Int x = max(0, In x). The unique solution is given by 
oo i—l 
Y, = Bi+ 9 Bui | | Ar-j. (4.21) 
i=1 j=0 
where the sum converges absolutely, almost surely. 


We can develop some intuition for the conditions (4.20) and the form of the 
solution (4.21) by iterating equation (4.19) k times to obtain 


Yı = A;(Ay—-1¥;-2 + Bi-1) + Br 


k i—l k 
=B +) Bi | [| A; + Yi | [A i- 
i=l j=0 i=0 
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The conditions (4.20) ensure that the middle term on the right-hand side converges 
absolutely and the final term disappears. In particular, note that 
k 2g 
FF 2 |A;_;] —> E(n|A;]) <0 
by the strong law of large numbers. So 
k k : 
Į [14c-il = exp (dom Al) 25 0, 
i=0 i=0 
which shows the importance of the E(In|A;|) < 0 condition. The solution (4.21) to 
the SREis a strictly stationary process (being a function of iid variables (As, Bs)s<r), 
and the E(In|A;|) < 0 condition turns out to be the key to the strict stationarity of 
ARCH and GARCH models. 


Stationarity of ARCH(1). The squared ARCH(1) model (4.18) is an SRE of the 
form (4.19) with A; = a Z? and B, = œo Z?. Thus the conditions in (4.20) translate 
into the requirements that E (In* |aoZ?|) < 00, which is automatically true for the 
ARCH(1) process as we have defined it, and E (In(a Z?)) < 0. This is the condition 
for a strictly stationary solution of the ARCH(1) equations, and it can be shown that 
it is in fact a necessary and sufficient condition for strict stationarity (see Bougerol 
and Picard 1992). From (4.21), the solution of equation (4.18) takes the form 


[0.0] l 
X? =a ys a I] Ze: (4.22) 
i=0 j=0 
If the (Z,) are standard normal innovations, then the condition for a strictly sta- 
tionary solution is approximately a; < 3.562; perhaps somewhat surprisingly, if 
the (Z;) are scaled ¢ innovations with four degrees of freedom and variance 1, the 
condition is a; < 5.437. Strict stationarity depends on the distribution of the inno- 
vations but covariance stationarity does not; the necessary and sufficient condition 
for covariance stationarity is always a, < 1, as we now prove. 


Proposition 4.18. The ARCH(1) process is a covariance-stationary white noise 
process if and only if a, < 1. The variance of the covariance-stationary process is 
given by ag/(1 — a). 


Proof. Assuming covariance stationarity, it follows from (4.18) and E (Z2) = | that 
of = E(X?) = &œo + a E(X?) = &œo + ao. 


Clearly, o2 = &o/(l1 — a1) and we must have a, < 1. 
Conversely, if a; < 1, then, by Jensen’s inequality, 


E(n(@1Z?)) < In(E(1Z;)) = Inu) < 0, 
and we can use (4.22) to calculate that 


ao 


[0,0] 
7 ; 
RAD nA e = ae 
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Figure 4.3. (a), (b) Strictly stationary ARCH(1) models with Gaussian innovations that 
are not covariance stationary (#j = 1.2 and a; = 3, respectively). (c) A non-stationary 
(explosive) process generated by the ARCH (1) equations with a; = 4. Note that (b) and (c) 
use a special double-logarithmic y-axis where all values less than one in modulus are plotted 
at zero. 


The process (X;)rez is a martingale difference with a finite, non-time-dependent 
second moment. Hence it is a white noise process. 


See Figure 4.3 for examples of non-covariance-stationary ARCH(1) models 
as well as an example of a non-stationary (explosive) process generated by the 
ARCH(1) equations. The process in Figure 4.2 is covariance stationary. 


On the stationary distribution of X+. It is clear from (4.22) that the distribution of 
the (X+) in an ARCH(1) model bears a complicated relationship to the distribution of 
the innovations (Z;). Even if the innovations are Gaussian, the stationary distribution 
of the time series is not Gaussian, but rather a leptokurtic distribution with more 
slowly decaying tails. 

Moreover, from (4.15) we see that the distribution of X, is a normal mixture 
distribution of the kind discussed in Section 6.2. Its distribution depends on the 
distribution of o+, which has no simple form. 


Proposition 4.19. Form > 1, the strictly stationary ARCH(1) process has finite 
moments of order 2m if and only if E(Z2") < 00 anda, < (E(Z?"))-/™, 
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Proof. We rewrite (4.22) in the form xX? = AaB oe, for positive rvs Y;; = 
aoai TT 1 Z7_;, i > 1, and Y; o = ao. For m > 1 the following inequalities hold 
(the latter being Minkowski’s inequality): 

EO) + EY) < Ee + Yi") < CE) + Ey”. 


Since a 
E(X?") = Bz ( ( eS Ya) ) 
i=0 


it follows that 


(oe) CO m 
EZ) S: EOP) SEOs") < Bz") Seay") : 
i=0 i=0 
Since E(Y?",) = ataim ee) , it may be deduced that all three quantities are 
finite if and only if E(Z?”) < co and œ” E(Z?") < 1. 


For example, for a finite fourth moment (m = 2) we require a, < 1/3 in the 
case of Gaussian innovations and a < 1/6 in the case of t innovations with 
six degrees of freedom; for ¢ innovations with four degrees of freedom, the fourth 
moment is undefined. 

Assuming the existence of a finite fourth moment, it is easy to calculate its value, 
and also that of the kurtosis of the process. We square both sides of (4.18), take 
expectations of both sides and then solve for E(X > to obtain 


ag E(ZA( — a?) 
(d -— a1)? (1 — a? E (Z$) 


E(X}) = 


The kurtosis of the stationary distribution «x can then be calculated to be 


O EX) _ «zd -a?) 
2 Sie oa TE 
E(X7) ad afkz) 


where kz = E (Z4) denotes the kurtosis of the innovations. Clearly, when «kz > 1, 
the kurtosis of the stationary distribution is inflated in comparison with that of the 
innovation distribution; for Gaussian or t innovations, kx > 3, so the stationary 
distribution is leptokurtic. The kurtosis of the process in Figure 4.2 is 9. 


Parallels with the AR(1) process. We now turn our attention to the serial depend- 
ence structure of the squared series in the case of covariance stationarity (a; < 1). 
We write the squared process as 


X? = 0? Z? = 0f +07 (Z? — 1). (4.23) 


Setting V; = a? (Zz? — 1), we note that (V;),-z forms a martingale-difference series, 
since E|V;| < œ and E(V; | ¥;-1) = of E(Z? — 1) = 0. Now we rewrite (4.23) as 
X 2 =a +a,X ae , + V;, and observe that this closely resembles an AR(1) process 
for X as except that V; is not necessarily a white noise process. If we restrict our 
attention to processes where E(X is finite, then V; has a finite and constant second 
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moment and is a white noise process. Under this assumption, X . isan AR(1) process, 
according to Definition 4.7, of the form 


ao 2 (079) 
= Say XG V;. 
( i | ca( eo m) i 


It has mean ap /(1—a 1) and we can use Example 4.11 to conclude that the autocorre- 
lation function is p(h) = all, h € Z. Figure 4.2 shows an example of an ARCH(1) 
process with finite fourth moment whose squared values follow an AR(1) process. 


4.2.2 GARCH Processes 


Definition 4.20. Let (Z;)rez be SWN(O, 1). The process (X;),¢7, isa GARCH(p, q) 
process if it is strictly stationary and if it satisfies, for all £ € Z and some strictly 
positive-valued process (0;);<z, the equations 


Pp q 
Xp= OZ, of =+) Xi +) Bjo Gy) 
i=l j=l 
where a > 0, œ; > 0,i = 1,..., p, and j > 0, j =1,...,q. 


The GARCH processes are generalized ARCH processes in the sense that the 
squared volatility o? is allowed to depend on previous squared volatilities, as well 
as previous squared values of the process. 


The GARCH(1, 1) model. In practice, low-order GARCH models are most widely 
used and we will concentrate on the GARCH(1, 1) model. In this model periods of 
high volatility tend to be persistent, since |X;| has a chance of being large if either 
|X;—1| is large or o;—1 is large; the same effect can be achieved in ARCH models of 
high order, but lower-order GARCH models achieve this effect more parsimoniously. 
A simulated realization of a GARCH(1, 1) process with Gaussian innovations and 
its volatility are shown in Figure 4.4; in comparison with the ARCH(1) model of 
Figure 4.2, itis clear that the volatility persists longer at higher levels before decaying 
to lower levels. 


Stationarity. It follows from (4.24) that for a GARCH(1, 1) model we have 
Os 2 2 
of = œo + (a, Zj + Bio}, (4.25) 


which is again an SRE of the form Y; = A;Y;—; + B;, as in (4.19). This time it is an 
SRE for Y, = ož rather than X?, but its analysis follows easily from the ARCH(1) 
case. 

The condition E (ln |A;|) < 0 fora strictly stationary solution of (4.19) translates 
to the condition E (In(a Z? + B1)) < 0 for (4.25), and the general solution (4.21) 
becomes 


œo i 
of = ao +a) | [(aiZ7_; + Bi). (4.26) 


i=l j=1 
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If (67 )rez is a Strictly stationary process, then so is (X;);ez, since X; = 0; Z; and 
(Z;)+ez, is simply strict white noise. The solution of the GARCH(1, 1) defining 
equations is then 


X= Z; wo( 14+ > Tez, +60). (4.27) 


i=l j=1 
and we can use this to derive the condition for covariance stationarity. 


Proposition 4.21. TheGARCH(1, 1) process is a covariance-stationary white noise 
process if and only ifa, +6, < 1. The variance of the covariance-stationary process 
is given by ag/(1 — a — ß1). 


Proof. We use a similar argument to Proposition 4.18 and make use of (4.27). 


Fourth moments and kurtosis. Using a similar approach to Proposition 4.19 we can 
use (4.27) to derive conditions for the existence of higher moments of a covariance- 
stationary GARCH(1, 1) process. For the existence of a fourth moment, a necessary 
and sufficient condition is that E ((œ1 Zz +6 D2 < 1, or alternatively that 


(a, + Bi)? < 1- (kz — Na}. 


Assuming this to be true, we calculate the fourth moment and kurtosis of X;. We 
square both sides of (4.25) and take expectations to obtain 


Elof) = a9 + (afKz + Bp + 20161) E(o;') + 2ao(a1 + Bi) E(G/). 
Solving for E (of), recalling that E(o7) = E(X?) = ao/(1 — a1 — £1), and setting 
E(X?) = kz E (of), we obtain 

apKz(1 — (a1 + b1)” 
(1 — a — 1) (1 — afkz — BP — 20181)’ 


from which it follows that 


E(XX)= 


ez = (@ +B?) 
(1 — (@1 + B1)? — («z — az) 
Again it is clear that the kurtosis of X; is greater than that of Z, whenever «z > 1, 


such as for Gaussian and scaled t innovations. The kurtosis of the GARCH(1, 1) 
model in Figure 4.4 is 3.77. 


KX 


Parallels with the ARMA (1, 1) process. Using the same representation as in equa- 
tion (4.23), the covariance-stationary GARCH(1, 1) process may be written as 


x? =ao+ aX? , T Bio; + Vy, 


where V, is a martingale difference, given by V, = ao (Z? — 1). Since ae 1= 


Kri — V;—1, we may write 


X? =a + (1 + B1)X7_, — b1 Vi-1 + Vi, (4.28) 
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Figure 4.4. A GARCH(1, 1) process with Gaussian innovations and parameters ag = 0.5, 
a, = 0.1, By = 0.85: (a) the realization of the process; (b) the realization of the volatility; and 
correlograms of (c) the raw and (d) the squared values. The process is covariance stationary 
with unit variance and a finite fourth moment and the squared values follow an ARMA(1, 1) 
process. The true form of the ACF of the squared values is shown by a dashed line in the 
correlogram. 


which begins to resemble an ARMA(1, 1) process for X 2; If we further assume that 
E (x4) < œQ, then, recalling that a; + 6, < 1, we have formally that 


x2_-__% _) _ @,+4,)( x? 1 = BiVi-1 + Vi 
' 1l-a- py) = aii 


isan ARMA(1, 1) process. Figure 4.4 shows an example of aGARCH(1, 1) process 
with finite fourth moment whose squared values follow an ARMA(1, 1) process. 


The GARCH(p, q) model. Higher-order ARCH and GARCH models have the 
same general behaviour as ARCH(1) and GARCH(1, 1), but their mathematical 
analysis becomes more tedious. The condition for a strictly stationary solution of the 
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defining SRE has been derived by Bougerol and Picard (1992), but it is complicated. 
The necessary and sufficient condition that this solution is covariance stationary is 
Dizi% +) j= By < 1 

A squared GARCH(p, q) process has the structure 


max(p,q) q 
X? = ao + 5 (aj + BDX? i — JO Bj Vij + Vin 
i=1 j=l 


where a; = Ofori = p + 1,...,q ifq > p,or j =Oforj=qtl,..., pif 
p > q. This resembles the ARMA(max(p, q), q) process and is formally such a 
process provided E(X 14 < ©. 


Integrated GARCH. The study of integrated GARCH (or IGARCH) processes has 
been motivated by the fact that, in some applications of GARCH modelling to daily 
or higher-frequency risk-factor return series, the estimated ARCH and GARCH 
coefficients (a1, ..., @p, B1,..-, By) are observed to sum to a number very close to 
1, and sometimes even slightly larger than 1. Ina model where MA aj+ Di Bj 2 
1, the process has infinite variance and is thus non-covariance-stationary. The special 
case where D 1&i + De j = 1 is known as IGARCH and has received some 
attention. 

For simplicity, consider the IGARCH(1, 1) model. We use (4.28) to conclude that 
the squared process must satisfy 


VX? = X? — X? | = œ — (1 — &1)V;—1 + Vi, 


where V; is a noise sequence defined by V, = o? (Z? — 1) and o? = œo + aX? , + 
d-a, jor . This equation is reminiscent of an ARIMA (0, 1, 1) model (see (4.12)) 
for X a although the noise V; is not white noise, nor is it strictly speaking a martingale 
difference according to Definition 4.6. E(V; | ¥;~1) is undefined since E (67) = 
E(X?) = oo, and therefore E|V;| is undefined. 


4.2.3 Simple Extensions of the GARCH Model 


Many variants and extensions of the basic GARCH model have been proposed. We 
mention only a few (see Notes and Comments for further reading). 


ARMA models with GARCH errors. We have seen that ARMA processes are driven 
by a white noise (€;);<7z and that a covariance-stationary GARCH process is an 
example of a white noise. In this section we put the ARMA and GARCH models 
together by setting the ARMA error £; equal to o; Z+, where o; follows a GARCH 
volatility specification in terms of historical values of ¢;. This gives us a flexible 
family of ARMA models with GARCH errors that combines the features of both 
model classes. 


Definition 4.22. Let (Z;);cz be SWN(0, 1). The process (X+)rez is said to be an 
ARMA(p1, q1) process with GARCH(p2, q2) errors if it is covariance stationary 
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and satisfies difference equations of the form 


Xi = Ut +0: Zr, 


pı qı 
m = M+ Y Xii w) +Y O; Xij — i), 
i=l j=l 


P2 q2 
of = œo + Xai (Xei = uri) + a Bye j» 
i=l j=l 
where og > 0,0; > 0,i = 1,..., p2, By Z 0, j = 1,...,q2, and D2 ai + 

Bj <1. 

To be consistent with the previous definition of an ARMA process we build the 
covariance-stationarity condition for the GARCH errors into the definition. For the 
ARMA process to be a causal and invertible linear process, as before, the polynomials 
ọl(z)=1- dz- — bp, 2?! and 6(z) =14+Oz2+---+ 64,24! should have no 
common roots and no roots inside the unit circle. 

Let (Fi)rez denote the natural filtration of (X;);-z, and assume that the ARMA 
model is invertible. The invertibility of the ARMA process ensures that ju; is 
F;—1-measurable as in (4.11). Moreover, since o; depends on the infinite history 
(Xs — Ms)s<r—1, the ARMA invertibility also ensures that o; is ¥;_,-measurable. 
Simple calculations show that u, = E(X; | Fi—1) and o? = var(X; | Fi—1), so 
that u; and o? are the conditional mean and variance of the new process. 


GARCH with leverage. One of the main criticisms of the standard ARCH and 
GARCH models is the rigidly symmetric way in which the volatility reacts to recent 
returns, regardless of their sign. Economic theory suggests that market information 
should have an asymmetric effect on volatility, whereby bad news leading to a fall in 
the equity value of a company tends to increase the volatility. This phenomenon has 
been called a leverage effect, because a fall in equity value causes an increase in the 
debt-to-equity ratio, or so-called leverage, of a company and should consequently 
make the stock more volatile. At a less theoretical level it seems reasonable that 
falling stock values might lead to a higher level of investor nervousness than rises 
in value of the same magnitude. 

One method of adding a leverage effect toa GARCH (1, 1) model is by introducing 
an additional parameter into the volatility equation (4.24) to get 


of = æo + 0 (X1 + ôlX 1D? + Biok. (4.29) 


We assume that ô € [—1, 1] and a; > 0, as in the GARCH(1, 1) model. Observe 
that (4.29) may be written as 


2 Jaotay(1+6)?X? +o X10, 
" [ao + a (1 — 6)?X?_, + Bio2,, X1 <0, 


and hence that 


do? Sees X-1 20, 


ax? , |oi—8)*o2,, X <0. 
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The response of volatility to the magnitude of the most recent return depends on 
the sign of that return, and we generally expect ô < 0, so bad news has the greater 
effect. 


Threshold GARCH. Observe that (4.29) may easily be rewritten in the form 
of = æo + &1 X? + õi lx, <9 X71 + Bio, (4.30) 


where @ = œı (1 + ô)? and 5 = —4ôa]. Equation (4.30) gives the most common 
version of a threshold GARCH (or TGARCH) model. In effect, a threshold has been 
set at level zero, and at time t the dynamics depend on whether the previous value of 
the process X;_, (or innovation Z;_1) was below or above this threshold. However, 
it is also possible to set non-zero thresholds in TGARCH models, so this represents 
a more general class of model than GARCH with leverage. 

In a less common version of threshold GARCH, the coefficients of the GARCH 
effects depend on the signs of previous values of the process; this gives a first-order 
process of the form 


Oe =aot+ aX? , + Bio7, + 51x, <0}97_4- (4.31) 


Remark 4.23. Note, also, that a further way to introduce asymmetry into a GARCH 
model is to explicitly use an asymmetric innovation distribution (albeit normalized 
to have mean 0 and variance 1). Candidate distributions could come from the gen- 
eralized hyperbolic family of Section 6.2.3. 


4.2.4 Fitting GARCH Models to Data 


Building the likelihood. In practice, the most widely used approach to fitting 
GARCH models to data is maximum likelihood. We consider in turn the fitting of the 
ARCH(1) and GARCH(1, 1) models, from which the fitting of general ARCH(p) 
and GARCH(p, q) models easily follows. 

For the ARCH(1) and GARCH(1, 1) models, suppose we have a total of n + 1 
data values Xo, X1,..., Xn. It is useful to recall that we can write the joint density 
of the corresponding rvs as 


n 
Fo .Xn E0 0605 Xn) = Fx x0) | | fax Xor | X1- x0). (4.32) 
t=1 
For the pure ARCH(1) process, which is first-order Markovian, the conditional 
densities fx,|X,—1,..., Xo in (4.32) depend on the past only through the value of o; or, 
equivalently, X;—1. The conditional density is easily calculated to be 


1 x 
FR NX pap y00,Xq Kr | Xt-1, ~~ XO) = FX |X, Or | 1-1) = = fi(*), (4.33) 
t t 


where o; = (a + yx? ee 2 and fz(z) denotes the density of the innovations 
(Z;)+ez. We recall that this must have mean 0 and variance 1, and typical choices 
would be the standard normal density or the density of a t distribution scaled to have 
unit variance. 
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However, the marginal density fx, in (4.32) is not known ina tractable closed form 
for ARCH and GARCH models, and this poses a problem for basing a likelihood 
on (4.32). The solution employed in practice is to construct the conditional likelihood 
given Xo, which is calculated from 


t=1 


For the ARCH(1) model this follows from (4.33) and is 
n 
1 Xt 
L(ag, 1; X) = fXi,..,XnlXo(X1; -- -> Xn | Xo) =[][—sz —], 
Ot Ot 


with of = (œo + aX? bar For an ARCH(p) model we would use analogous 
arguments to write down a likelihood conditional on the first p values. 

In the GARCH(1, 1) model, o; is recursively defined in terms of o;_1, and here, 
instead of using (4.34), we construct the joint density of X1, . . ., Xn conditional on 
realized values of both Xo and oo, which is 


t=1 
The conditional densities fy,|x,_,,...,X 9,09 depend on the past only through the value 
of o;, which is given recursively from 09, Xo, ..., X;—1 using o = ao +aX? | + 
Bias 4: This gives us the conditional likelihood 


n 


1 X 
Lœ 61; X) =] ] = fe(). o = Jao +a1X? ; + fio? y. 
t t 


t=1 


The problem remains that the value of og is not actually observed, and this is usually 
solved by choosing a starting value, such as the sample variance of X1,..., Xn, or 
even simply zero. 

For a GARCH(p, q) model, we would assume that we had n + p data values 
labelled X_p+1,..., Xo, X1,..., Xn. We would evaluate the likelihood conditional 
on the (observed) values of X_p+1,..., Xo as well as the (unobserved) values of 
O_q4+1,---, 00, for which starting values would be used as above. For example, if 
p = l and q = 3, we require starting values for o9, o_; and o_2. 

A similar approach can be used to develop a likelihood for an ARMA model with 
GARCH errors. In this case we would end up with a conditional likelihood of the 


form 
n 
1 Xt — 
L0: X) =|] = fe( ="), 
Ot Or 
t=1 

where o; follows a GARCH specification and jz; follows an ARMA specification 
as in Definition 4.22, and all unknown parameters (possibly including unknown 
parameters of the innovation distribution) have been collected in the vector 0. We 
could of course also consider models with leverage or threshold effects. 
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Deriving parameter estimates. Consider, then, a log-likelihood of the form 
n 
In L(6; X) = X` 1,6), (4.35) 
t=1 


where l; denotes the log-likelihood contribution arising from the tth observation. 
The maximum likelihood estimate @ maximizes the (conditional) log-likelihood 
in (4.35) and, being in general a local maximum, solves the likelihood equations 


n 


0 
— lIn L(0; X) = 
59 LOD 2 


3L) _ 
g = (4.36) 


where the left-hand side is also known as the score vector of the conditional likeli- 
hood. The equations (4.36) are usually solved numerically using so-called modified 
Newton-Raphson procedures. A particular method that is widely used for GARCH 
models is the BHHH method of Berndt et al. (1974). 

In describing the behaviour of parameter estimates in the following paragraphs, 
we distinguish two situations. In the first situation we assume that the model that 
has been fitted has been correctly specified, so that the data are truly generated by a 
time-series model with both the assumed dynamic form and innovation distribution. 
We describe the asymptotic behaviour of the maximum likelihood estimates (MLEs) 
under this idealization. 

In the second situation we assume that the correct dynamic form is fitted but that 
the innovations are erroneously assumed to be Gaussian. Under this misspecification, 
the model fitting procedure is known as quasi-maximum likelihood (QML) and the 
estimates obtained are QMLEs. Essentially, the Gaussian likelihood is treated as an 
objective function to be maximized rather than a proper likelihood; our intuition 
suggests that this may still give reasonable parameter estimates, and this turns out 
to be the case under appropriate assumptions about the true innovation distribution. 


Properties of MLEs. It helps to recall at this point the asymptotic distribution 
theory for MLEs in the classical iid case, which is summarized in Section A.3. The 
asymptotic results we give for GARCH models have a similar form to the results 
in the iid case, but it is important to realize that this is not simply an application 
of these results. The asymptotics have been separately and laboriously derived in a 
series of papers for which starting references are given in Notes and Comments. We 
will give results for pure GARCH models without ARMA components or additional 
leverage structure, which have been studied rigorously, but the form of the results 
will apply more generally. 

For a pure GARCH(p, q) model with Gaussian innovations it can be shown that 
(assuming the model has been correctly specified) 


Vib, — 8) Š Nyxg4i(0, 10)~!), 


where 


dl; (8) ~~) a p( oo) (4.37) 


110) = E( 30 00’ 3000’ 
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is the Fisher information matrix arising from any single observation. Thus we have 
consistent and asymptotically normal estimates of the GARCH parameters. In prac- 
tice, the expected information matrix I (0) is approximated by an observed informa- 
tion matrix, and here we could take the observed information matrix coming from 
either of the equivalent forms for the expected information matrix in (4.37). That is, 
we could use 


n 


-a | L (340) 3L) E EEAO) 
f= 5 ( 00 w) IOS a AP) 


t=1 


where the first matrix is said to have outer-product form and the second is said to 
have Hessian form. These matrices are estimated by evaluating them at the MLEs 
to get I (6) or J (6). In practice, the derivatives of the log-likelihood at the MLE are 
often approximated using first- and second-order differences. 

If the model is correctly specified, the estimates I (6) and J (6) should be broadly 
similar, being estimators based on two different expressions for the same Fisher 
information matrix. In practice, we could also estimate J (0) by J (6 va (6) SET 6 ), and 
this anticipates the so-called sandwich estimator that is used in the QML procedure. 


Properties of QMLEs. In this approach we assume that the true data-generating 
mechanism is a GARCH(p,q) model with non-Gaussian innovations, but we 
attempt to estimate the parameters of the process by maximizing the likelihood 
for a GARCH(p, q) model with Gaussian innovations. We still obtain consistent 
estimators of the model parameters and, if the true innovation distribution has a 
finite fourth moment, we again get asymptotic normality; however, the form of the 
asymptotic covariance matrix changes. 
We now distinguish between matrices 7 (0) and J (0), given by 


2 
10) = (70 ). 10) =-2(5 0), 


000" 0000’ 


where the expectation is now taken with respect to the true model (not the mis- 
specified Gaussian model). The matrices / (0) and J (0) differ in general (unless the 
Gaussian model is correct). It may be shown that 


Vin — 0) > Np+q+1(0, J) 110) O), (4.39) 


and the asymptotic covariance matrix is said to be of sandwich form; it can be 
estimated by J(6)~!7(8)J(6)~!, where 7(@) and J(@) are defined in (4.38). If 
the model-checking procedures described below suggest that the dynamics have 
been adequately described by the GARCH model, but the Gaussian assumption 
seems doubtful, then standard errors for parameter estimates should be based on 
this covariance matrix estimate. 


Model checking. As with ARMA models, it is usual to check fitted GARCH models 
using residuals. We consider a general ARMA-—GARCH model of the form X;— u = 
& = 0;Z,;, with u; and op as in Definition 4.22. In this model we distinguish 
between unstandardized and standardized residuals. The former are the residuals 
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&1,..., & from the ARMA part of the model; they are calculated using the approach 
in (4.13), and under the hypothesized model they should behave like a realization of 
a pure GARCH process. The latter are reconstructed realizations of the SWN that 
is assumed to drive the GARCH part of the model, and they are calculated from the 
former by 


p2 q2 

A A me a2 A A a2 a a2 

Zi = &1/64, Of = Qo + ) QE i + J BjO;_;- (4.40) 
i=l j=l 


To use (4.40) we need some initial values, and one solution is to set required starting 
values of ê, equal to zero and required starting values of the volatility ô; equal to 
either the sample variance or zero. Because the first few values will be influenced 
by these starting values, as well as the starting values required to calculate the 
unstandardized residuals, they may be ignored in later analyses. 

The standardized residuals should behave like an SWN and this can be investigated 
by constructing correlograms of raw and absolute values and applying portmanteau 
tests of strict white noise, as described in Section 4.1.3. 

Assuming that the SWN hypothesis is not rejected, so that the dynamics have 
been satisfactorily captured, the validity of the distribution used in the ML fitting 
can also be investigated using Q-Q plots and goodness-of-fit tests for the normal or 
scaled t distributions. If the Gaussian likelihood does a reasonable job of estimating 
dynamics, but the residuals do not behave like iid standard normal observations, then 
the QML fitting philosophy can be adopted and standard errors can be estimated 
using the sandwich estimator implied by (4.39) above. 

This opens up the possibility of two-stage analyses, where first the dynamics are 
estimated by QML methods and then the innovation distribution is modelled using 
the residuals from the dynamic model as data. The first stage is sometimes called 
pre-whitening of the data. In the second stage we might consider using heavier-tailed 
models than the Gaussian that also allow some asymmetry in the innovations. 

A disadvantage of the two-stage approach is that the error from the time-series 
modelling propagates through to the distributional fitting in the second stage and the 
overall error is hard to quantify, but the procedure does lead to more transparency in 
model building and allows us to separate the tasks of volatility modelling and mod- 
elling the shocks that drive the process. In higher-dimensional risk-factor modelling, 
it may be a useful pragmatic approach. 


Example 4.24 (GARCH model for Microsoft log-returns). We consider the 
Microsoft daily log-returns for the period 1997-2000 (1009 values), as shown in 
Figure 4.5. Although the raw returns show no evidence of serial correlation (see Fig- 
ure 4.6), their absolute values do show serial correlation and they fail a Ljung—Box 
test (based on the first ten estimated correlations) at the 5% level. 

For these data, models with Student ¢ innovations are clearly preferred to models 
with Gaussian innovations, so we adopt an ML approach to fitting models with ¢ inno- 
vations. We compare the standard GARCH(1, 1) model (with a constant mean term) 
with models that incorporate ARMA structure (AR(1), MA(1) and ARMA(1, 1)) 
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Figure 4.5. Microsoft log-returns 1997-2000; data and estimate of volatility from a 
GARCH(1, 1) model with a leverage term. (a) Original series. (b) Conditional standard devi- 
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Figure 4.6. Microsoft log-returns 1997-2000; correlograms of data ((a) raw and (b) abso- 
lute values) and residuals ((c) raw and (d) absolute values) from a GARCH(1, 1) model. 


for the conditional mean; the ARMA structure seems to offer little improvement in 
the model, and the basic GARCH(1, 1) model is favoured in an Akaike comparison. 
However, a model with a leverage term as in (4.29) does seem to offer an improve- 
ment. Both the raw and absolute standardized residuals obtained from this model 
show no visual evidence of serial correlation (see again Figure 4.6) and they do not 
fail Ljung—Box tests. The estimated degrees-of-freedom parameter of the (scaled) 
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Figure 4.7. Microsoft log-returns 1997-2000; Q-Q plot of residuals from a GARCH(1, 1) 
model with leverage against a Student ż distribution with 6.30 degrees of freedom. 


Table 4.1. Analysis of Microsoft log-returns for the period 1997-2000; ML estimates of 
parameters and standard errors for a GARCH(1, 1) model with a leverage term under the 
assumption of t innovations. 


Parameter Estimate Standard error Ratio 
u 9.35 x 1074 7.21x1074 1.30 
ao 7.79 x 1075  3.07x1075 2.54 
a 0.108 0.0369 2.91 
bi 0.778 0.0673 11.57 
5 —0.178 0.123 —1.45 


t distribution is 6.30 (the standard error is 1.07) and a Q-Q plot of the residuals 
against this reference distribution reveals a satisfactory correspondence (see Fig- 
ure 4.7). The estimates of the remaining parameters (with standard errors) in this 
model are given in Table 4.1. 


4.2.5 Volatility Forecasting and Risk Measure Estimation 


In this section we assume that our underlying model is a strictly and covariance- 
stationary time-series process (X;) adapted to a filtration (F;) satisfying equations 


of the form 
Xt = Mt +0;Z,, (4.41) 


where u; and o; are ¥;_;-measurable and Z; is an innovation variable with mean 0 
and variance | that is independent of ¥;_;. Examples fitting into the framework 
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of (4.41) are any of the ARCH and GARCH models discussed in this chapter as well 
as causal and invertible ARMA models with GARCH errors. 

Our task is to forecast 04, forh > 1 based ona sample ofn data X;~n+1,..., Xt, 
which are assumed to be generated by the process (4.41). As in Section 4.1.5 we 
assume that we have observed the infinite history of the process up to time t and 
derive prediction formulas that we adapt to take account of the finiteness of the 
sample. 

Since 


Epin | Fr) = E(Xt4n — Meta)” | Fi), 


our forecasting problem is closely related to the problem of predicting (X;+ — 
Ut+h)”, and we can use a similar approach to prediction to the one described in 
Section 4.1.5. We first derive prediction equations under explicit assumptions about 
the underlying model (i.e. when we specify the structure of o; and u; in (4.41)) 
before presenting the more ad hoc technique of exponentially weighted moving- 
average (EWMA) prediction. Finally, we describe how forecasts of volatility form 
the basis for estimates of value-at-risk and expected shortfall. 


GARCH-based volatility prediction. Assume that a GARCH model has been fitted 
and its parameters estimated; we will suppress estimator notation for the parameters 
in the remainder of the section. We make calculations for two simple models, from 
which the general procedure for more complex models should be clear. 


Example 4.25 (prediction in the GARCH(1, 1) model). Suppose that we use 
a pure GARCH(1, 1) model as in Definition 4.20, which conforms to (4.41) with 
lt = 0. Since E (Xi+n | Ft) = 0 (the martingale-difference property of the GARCH 
process), optimal predictions of X;, are zero. A natural prediction of X z 1 based 
on F; is its conditional mean ofa given by 


E(X}11 | Fi) = ofp = ao +1X? + io, 


and, if E(x?) < œ, this is the optimal squared error prediction. Note that the 
prediction of the random variable X Z 1 based on the information F; is the value of 
of , Which is known at time t, being a function of the history of the process. 

In practice, we have to make an approximation based on this formula because the 
infinite series of past values that would allow us to calculate o7 is not available to us. 
A natural approach in applications is to approximate o by an estimate of squared 
volatility 6? calculated from the residual equations (4.40). Our approximate forecast 
of X 2 1 also functions as an estimate of the squared volatility at time ¢ + 1 and is 
given by 


62.) = E(X?,, | Fi) = œo +01 X? + B67. (4.42) 


Thus equation (4.42) can be thought of as a recursive scheme for estimating volatility 
one step ahead. 
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Figure 4.8. Estimate of volatility for the final days of the year 2000 and predictions of 
volatility for the first ten days of 2001 based on a GARCH(1, 1) model (without leverage) 
fitted to the Microsoft return data in Example 4.24. 


2 


ith and 


When we look h > 1 steps ahead given the information at time t, both X 
of, ņ„ are rvs. Their predictions coincide and are 


E(X?,, | F) = Elon | Fi) 
= ao +4 E(X? p1 | F) + BEO n1 | F) 
= a9 + (a + BDE(X? p1 | F), 


so that a general formula is 


h-1 
E(X? n | Fi) = a0 > (ai + Bi)’ + (a + Bi)" X? + pio’), 
i=0 


and we obtain a practical formula by substituting an estimate of squared volatility 6? 
as before. As h —> oo we observe that Efn | Fi) > ao/( — a — 1), almost 
surely, so that the prediction of squared volatility converges to the unconditional vari- 
ance of the process. A concrete example of volatility prediction ina GARCH(1, 1) 
model is given in Figure 4.8 for the Microsoft data analysed in Example 4.24. 


We now consider a second example, which combines what we know about pre- 
diction in ARMA and GARCH models. 


Example 4.26 (prediction in an ARMA(1, 1)-GARCH(1, 1) model). Suppose 
that we use an ARMA(1, 1) model with GARCH(1, 1) errors as in Definition 4.22. 
This also conforms to (4.41), and prediction formulas for this model follow easily 
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from Examples 4.15 and 4.25. We calculate that 


E(Xi+h | Fi) = w+ OX — w+ bf Oer, (4.43) 
h-1 

var(Xi4n | Fr) = a0 D(a + Bi)! + (ar + Bi)" (cre? + Bio), (4.44) 
i=0 


and these are approximated by substituting inferred values for ¢; and o; obtained 
from the residual equations (4.40). Equation (4.43) yields predictions of 44+, or 


Xı+n, and equation (4.44) yields predictions of (Xr+ — ene or Tih 


Exponential smoothing for volatility. Now suppose that we do not want to make 
detailed assumptions about the structure of o; and jz; in (4.42). We consider a simpler 
scheme for forecasting volatility that builds on the exponential smoothing idea of 
Section 4.1.5. We recall from (4.14) that a forecast P;(X;+1) of X;+1 based on time-t 
information can be constructed using an updating scheme of the form 


PX 41 =AX, + (1 —A)Py-1 Xt (4.45) 


for an appropriately chosen value of the parameter à. If we apply this scheme to the 
prediction of (X;41 — pean, we obtain 


P(Xp41 — Meg)? = A(X; — H)? + — a) P(X — br)? (4.46) 


for an appropriately chosen value of the parameter a. Of course, in addition to 
choosing «œ, we also need to insert an estimate of the unobserved conditional mean 
Ly to use (4.46). 

Since Ore = E(Xt41 — eet | Fi), we can also use (4.46) as an exponential 
smoothing scheme for the unobserved squared volatility. This yields a recursive 
scheme for the one-step-ahead volatility forecast given by 


ô? = a(X, — fy)? + (1-67, (4.47) 


which defines the EWMA procedure. For many risk-factor return series, the con- 
ditional mean appears to be close to zero (recall the stylized facts of return series 
in Section 3.1) and we often set A, = 0. Alternatively, we can apply the exponen- 
tial smoothing idea to the conditional mean and replace fi; by an estimate P;_1 X; 
derived using the recursive scheme (4.45). Typical values for œ are generally small; 
for example, in the RiskMetrics methodology widely used by banks, a value of 
a = 0.06 has been recommended (Mina and Xiao 2001). 

If we compare (4.47) with the one-step-ahead volatility estimation scheme defined 
by aGARCH(1, 1) model in (4.42), it is tempting to say that EWMA corresponds to 
estimating volatility using a conditional-expectation-based technique in an IGARCH 
model, where the parameter œo equals zero. This analogy should be used with care; 
GARCH and IGARCH models with wg = 0 are not well defined, and the solution 
of the stochastic recurrence relation in (4.27) vanishes. Moreover, IGARCH is not 
covariance stationary. It is better to regard EWMA as a sensible model-free approach 
to volatility forecasting based on the classical technique of exponential smoothing. 
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Estimates of VaR and expected shortfall. Finally, we suppose that the data 
X;—n+1,---, X; can be interpreted as financial losses and we consider the applica- 
tion of risk measures based on loss distributions (see Section 2.3.1) to the conditional 
distribution F’y, , ,|¢,. For example, the data may represent negative log-returns on an 
asset price rather than returns. In particular, we look at the estimation of value-at-risk 
and expected shortfall for the distribution Fy, , \|¥,- 

Writing Fz for the df of the innovations (Z;), the ¥;-measurability of +41 and 
0++1 implies that 


FY, lf, X) = P (Hip tori Zi <x | Fi) = Fz((x — mr4i)/or41). 


Let VaR/, denote the a-quantile of Fx,,,|¢, and let ES$, denote the corresponding 
expected shortfall. Using the approach of Examples 2.11 and 2.14 we obtain 


VaRi, = Ur+1 + 014190(Z), ES), = ur+1 + 0741 ESa(Z), (4.48) 


where we write Z for a generic rv with df Fz. 

It is clear that if we can estimate jz; and o;+1, then we only need to be able to 
estimate qo (Z) and ES, (Z) for the innovation distribution to obtain estimates of the 
risk measures in (4.48). This task can be accomplished in both a parametric and a non- 
parametric (or semi-parametric) setting. If we estimate a fully specified GARCH- 
type model using the ML approach of Section 4.2.4, then it is mostly straightforward 
to calculate qg (Z) and ES,(Z) for the estimated innovation distribution. If, on the 
other hand, we use a QML method to fit a GARCH-type model or, even more 
simply, we use exponential smoothing techniques to estimate the volatility and 
conditional mean, then we can form residuals Zs = (X, — fis)/6s fors =t—n+ 
1,..., and apply quantile and expected shortfall estimation techniques to these 
residuals; statistical methods for estimating risk measures from data are discussed 
in Section 9.2.6. 


Notes and Comments 


The ARCH process was originally proposed by Engle (1982), and the GARCH 
process by Bollerslev (1986), who gave the condition for covariance stationarity. 
Overview texts on GARCH models include the books by Gouriéroux (1997) and 
Francq and Zakoian (2010) and a number of useful review articles including Boller- 
slev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994) and Shephard 
(1996). There are also substantial sections on GARCH models in the books by 
Alexander (2001), Tsay (2002) and Zivot and Wang (2003). The IGARCH model 
was first discussed by Engle and Bollerslev (1986). 

The condition for strict stationarity of GARCH models was derived by Nelson 
(1990) in the case of the GARCH(1, 1) model and by Bougerol and Picard (1992) 
for GARCH(p, q). The necessary theory involves the study of stochastic recurrence 
relations and goes back to Kesten (1973); Brandt (1986) is also a useful reference. 
Readable accounts of this theory may be found in Embrechts, Kluppelberg and 
Mikosch (1997), Mikosch and Starica (2000) and Mikosch (2003, 2013). 
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For more on the derivation of conditional likelihood functions for ARCH and 
GARCH models, see Hamilton (1994) and Tsay (2002). The BHHH algorithm 
(Berndt et al. 1974) is the most commonly used approach to numerically maximizing 
the likelihood. For an informative general discussion of numerical optimization 
procedures in the context of maximum likelihood, see Hamilton (1994, pp. 133- 
142). Standard general references on the QML approach are White (1981) and 
Gouriéroux, Montfort and Trognon (1984). 

The essential asymptotic properties of MLEs and QMLEs in GARCH models are 
described in many publications, but a detailed mathematical proof has often lagged 
behind the assertions. Early papers appealed to regularity conditions for condi- 
tionally specified models such as those of Crowder (1976), which are essentially 
unverifiable. Lee and Hansen (1994) and Lumsdaine (1996) proved consistency and 
asymptotic normality of QMLEs in the GARCH(1, 1) model. More recently, Berkes, 
Horvath and Kokoszka (2003) have extended this to the GARCH(p, q) model under 
minimal assumptions, and Straumann (2005) and Straumann and Mikosch (2006) 
have given similar results for a wide variety of first-order models. 

From a more practical point-of-view, it is not easy to estimate GARCH model 
parameters to a high degree of accuracy because of the flatness of the typical likeli- 
hoods and the non-negligible influence of starting values in finite samples. Readers 
who write their own code may wish to compare their estimates with benchmark 
studies by McCullough and Renfro (1999) and Brooks, Burke and Persand (2001). 

Alternative innovation distributions to the Gaussian and scaled ¢ distributions that 
have been considered include the generalized error distribution (GED) in Nelson 
(1991) and the normal inverse Gaussian (NIG) in Venter and de Jongh (2002); the 
latter authors present extensive evidence that the NIG is a good choice of innovation 
distribution for practical work and that GARCH inference based on the NIG is 
relatively robust to misspecification of the distribution. 

A great many extensions to the GARCH class have been proposed and thor- 
ough surveys may be found in Bollerslev, Engle and Nelson (1994) and Shephard 
(1996). Leverage effects in the GARCH model and the more general PGARCH 
(power GARCH) model are examined in Ding, Granger and Engle (1993). Various 
threshold GARCH models have been suggested; the model (4.30) is of the type sug- 
gested by Glosten, Jagannathan and Runkle (1993), while (4.31) is the switching- 
volatility GARCH (SV-GARCH) model of Fornari and Mele (1997). There have 
been proposals for non-parametric ARCH and GARCH modelling, including the 
multiplicative ARCH(p)-model of Yang, Hardle and Nielsen (1999) and the non- 
parametric GARCH procedure of Buhlmann and McNeil (2002). For long-memory 
processes modelling volatility, see the book by Beran et al. (2013). 

The use of the EWMA (exponentially weighted moving-average) volatility esti- 
mation method based on exponential smoothing was popularized by the RiskMetrics 
Group at JPMorgan (JPMorgan 1996; Mina and Xiao 2001). See also Zivot and Wang 
(2003) for examples of the use of this method. 
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Extreme Value Theory 


Extreme value theory (EVT) is a branch of probability concerned with limiting laws 
for extreme values in large samples. The theory contains many important results 
describing the behaviour of sample maxima and minima, upper-order statistics (such 
as the kth largest value in a sample) and sample values exceeding high thresholds. 
Our interest in this theory centres on the application of the results to developing 
models for the extremal behaviour of financial risk factors. In particular, we are 
interested in models for the tail of the distribution of financial risk-factor changes. 
We have observed at various points in Chapters 3 and 4 that risk-factor changes are 
frequently heavy tailed when compared with a normal distribution. 

Much of this chapter is based on the presentation of EVT in Embrechts, 
Klüppelberg and Mikosch (1997) (henceforth, EKM), and whenever theoretical 
detail is missing the reader should consult that text. We concentrate on describing 
the statistical models suggested by EVT, while briefly summarizing the theoretical 
ideas on which the statistical methods are based. 

We focus on two main kinds of model for extreme values. The most traditional 
models are the block maxima models described in Section 5.1: these are models 
for the largest observations collected from large samples of identically distributed 
observations. A more modern and powerful group of models are those for threshold 
exceedances, described in Section 5.2. These are models for all large observations 
that exceed some high level, and they are generally considered to be the most useful 
for practical applications, due to their more efficient use of the (often limited) data 
on extreme outcomes. 

The models for threshold exceedances can be embedded in an elegant point pro- 
cess framework that simultaneously addresses their occurrence in time as well as 
the magnitude of excess losses over the threshold. This is the so-called peaks-over- 
threshold (POT) model, which is presented in Section 5.3. The POT model serves 
as a starting point for developing more dynamic descriptions of the occurrence and 
magnitude of extremes using self-exciting (Hawkes) processes. These advanced 
dynamic models are treated in Chapter 16 along with multivariate EVT. 


5.1 Maxima 


To begin with we consider a sequence of iid rvs (X; );ey representing financial losses. 
These may have a variety of interpretations, such as operational losses, insurance 
losses and losses on a credit portfolio over fixed time intervals. Later we relax the 
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assumption of independence and consider that the rvs form a strictly stationary time 
series of dependent losses; they might be (negative) returns on an investment in a 
single stock, an index, or a portfolio of investments. 


5.1.1 Generalized Extreme Value Distribution 


Convergence of sums. The role of the generalized extreme value (GEV) distri- 
bution in the theory of extremes is analogous to that of the normal distribution 
(and, more generally, the stable laws) in the central limit theory for sums of rvs. 
Assuming that the underlying rvs X1, X2,... are iid with a finite variance, and writ- 
ing Sn = X1 +--+- + Xn for the sum of the first n rvs, the standard version of the 
central limit theorem (CLT) says that appropriately normalized sums (Sn — an)/bn 
converge in distribution to the standard normal distribution as n goes to infinity. The 
appropriate normalization uses sequences of normalizing constants (an) and (bn) 
defined by a, = nE(X,) and bn = y/n var(X}). In mathematical notation we have 


lim (= =e x) =O(x), xeR. 
bn 

Convergence of maxima. Classical EVT is concerned with limiting distributions 
for normalized maxima. We denote the maximum of 7 iid rvs Xj,..., Xn by My, = 
max(Xj,..., Xn) and refer to this also as an n-block maximum. The only possible 
non-degenerate limiting distributions for normalized maxima as n goes to infinity 


are in the GEV family. 


Definition 5.1 (generalized extreme value distribution). The df of the (standard) 
GEV distribution is given by 
Hela = [PCU FEN"), #0, 
; exp(—e*), — =0, 
where 1 + x > 0. A three-parameter family is obtained by defining Hg, uo (x) := 
Hz ((x — u)/o) for a location parameter u € R and a scale parameter o > 0. 


The parameter € is known as the shape parameter of the GEV distribution, and 
Hg defines a type of distribution, meaning a family of distributions specified up to 
location and scaling (see Section A.1.1 for a formal definition). The extreme value 
distribution in Definition 5.1 is generalized in the sense that the parametric form 
subsumes three types of distribution that are known by other names according to 
the value of £: when € > 0 the distribution is a Fréchet distribution; when £ = 0 
it is a Gumbel distribution; and when € < 0 it is a Weibull distribution. We also 
note that for fixed x we have limg_,9 Hg (x) = Ho(x) (from either side), so that the 
parametrization in Definition 5.1 is continuous in £, which facilitates the use of this 
distribution in statistical modelling. 

The df and density of the GEV distribution are shown in Figure 5.1 for the three 
cases € = 0.5, E = Oandé = —0.5, corresponding to Fréchet, Gumbel and Weibull 
types, respectively. Observe that the Weibull distribution is a short-tailed distribution 
with a so-called finite right endpoint. The right endpoint of a distribution will be 
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Figure 5.1. (a) The df of a standard GEV distribution in three cases: the solid line cor- 
responds to € = 0 (Gumbel); the dotted line is € = 0.5 (Fréchet); and the dashed line is 
é = —0.5 (Weibull). (b) Corresponding densities. In all cases, u = 0 and o = 1. 


denoted by xp = sup{x € R: F(x) < 1}. The Gumbel and Fréchet distributions 
have infinite right endpoints, but the decay of the tail of the Fréchet distribution is 
much slower than that of the Gumbel distribution. 

Suppose that maxima M,, of iid rvs converge in distribution as n — oo under an 
appropriate normalization. Recalling that P(M, < x) = F”(x), we observe that 
this convergence means that there exist sequences of real constants (d„) and (cy), 
where cn > 0 for all n, such that 


lim P((Mn — dn)/Cn <x) = lim F” (cnx + dy) = H(x) (5.1) 
n— oo n> oo 


for some non-degenerate df H (x). The role of the GEV distribution in the study of 
maxima is formalized by the following definition and theorem. 


Definition 5.2 (maximum domain of attraction). If (5.1) holds for some non- 
degenerate df H, then F is said to be in the maximum domain of attraction of H, 
written F € MDA(A). 


Theorem 5.3 (Fisher—Tippett, Gnedenko). If F € MDA(A) for some non- 
degenerate df H, then H must be a distribution of type H¢, i.e. a GEV distribution. 


Remarks 5.4. 


(1) If convergence of normalized maxima takes place, the type of the limiting 
distribution (as specified by £) is uniquely determined, although the loca- 
tion and scaling of the limit law (u and o) depend on the exact normalizing 
sequences chosen; this is guaranteed by the so-called convergence to types 
theorem (EKM, p. 554). It is always possible to choose these sequences such 
that the limit appears in the standard form Hz. 


(2) By non-degenerate df we mean a distribution that is not concentrated on a 
single point. 
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Examples. We calculate two examples to show how the GEV limit emerges for 
two well-known underlying distributions and appropriately chosen normalizing 
sequences. To discover how normalizing sequences may be constructed in general, 
we refer to Section 3.3 of EKM. 


Example 5.5 (exponential distribution). If the underlying distribution is an expo- 
nential distribution with df F (x) = 1—e~** for B > 0 and x > 0, then by choosing 
normalizing sequences cn = 1/ and d, = (Inn)/f we can directly calculate the 
limiting distribution of maxima using (5.1). We get 


F”( ana = 1 —x 4 _ 
CnXx + dp) =| 1 — —e , x>-—Inn, 
n 


lim F” (cnx + dn) = exp(—e *), x eR, 
n—->oCo 
from which we conclude that F € MDA (Ho). 


Example 5.6 (Pareto distribution). If the underlying distribution is a Pareto dis- 
tribution (Pa(a@, «)) with df F(x) = 1 — («/(k + x))“ fora > 0,« > Oandx > 0, 
we can take normalizing sequences cy, = qni/e /aandd, = kn! — x. Using (5.1) 
we get 


1 -0A n 
PM ens + dy) = (1-1143) Ve (aS gle 


; INT x 
lim Pastda = exp (—-(1+ =) i: 1+-—>0, 
noo a Q 
from which we conclude that F € MDA (HĦ1/a). 


Convergence of minima. The limiting theory for convergence of maxima encom- 
passes the limiting behaviour of minima using the identity 


min(X1,..., Xn) = — max(—X1,..., —Xn). (5.2) 


It is not difficult to see that normalized minima of iid samples with df F will 
converge in distribution if the df F(x) = 1 — F(—x), which is the df of the rvs 


—X,..., —Xn, is in the maximum domain of attraction of an extreme value dis- 
tribution. Writing M* = max(—X,,..., —X,,) and assuming that F € MDA (H), 
we have T 
lim p( Mi“ < x) = Hg (x), 
n—>0o Cn 


from which it follows easily, using (5.2), that 
(=a .-., Xn) + dn < 


Cn 


lim P 


noo 


x} = l= Hel x). 


Thus appropriate limits for minima are distributions of type 1 — Hg(—x). For a 
symmetric distribution F we have F(x) = F(x), so that if H; is the limiting type of 
distribution for maxima for a particular value of £, then 1 — Hg (—x) is the limiting 
type of distribution for minima. 
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5.1.2 Maximum Domains of Attraction 


For most applications it is sufficient to note that essentially all the common contin- 
uous distributions of statistics or actuarial science are in MDA(H¢) for some value 
of £. In this section we consider the issue of which underlying distributions lead to 
which limits for maxima. 


The Fréchet case. The distributions that lead to the Fréchet limit Hz (x) for € > 0 
have a particularly elegant characterization involving slowly varying or regularly 
varying functions. 


Definition 5.7 (slowly varying and regularly varying functions). 


(i) A positive, Lebesgue-measurable function L on (0, 00) is slowly varying at 00 


(written L € Ro) if 
L(tx) _ 


x00 L(x) Ss 


l, t>0. 


(ii) A positive, Lebesgue-measurable function A on (0, oo) is regularly varying 
at co with index p € Rif 


h(tx) 
im = 


=t?, t>0. 
x>œ h(x) 


Slowly varying functions are functions that, in comparison with power functions, 
change relatively slowly for large x, an example being the logarithm L(x) = In(x). 
Regularly varying functions are functions that can be represented by power functions 
multiplied by slowly varying functions, i.e. h(x) = x? L(x) for some L € Ro. 


Theorem 5.8 (Fréchet MDA, Gnedenko). Foré > 0, 
F € MDA(Ag) => F(x) = x7'/§ L(x) for some function L € Ro. (5.3) 


This means that distributions giving rise to the Fréchet case are distributions with 
tails that are regularly varying functions with a negative index of variation. Their 
tails decay essentially like a power function, and the rate of decay a = 1/& is often 
referred to as the tail index of the distribution. A consequence of Theorem 5.8 is 
that the right endpoint of any distribution in the Fréchet MDA satisfies xp = oo. 

These distributions are the most studied distributions in EVT and they are of par- 
ticular interest in financial applications because they are heavy-tailed distributions 
with infinite higher moments. If X is a non-negative rv whose df F is an element 
of MDA(H;z) for & > 0, then it may be shown that E(X*) = œ fork > 1/é 
(EKM, p. 568). If, for some small ¢ > 0, the distribution is in MDA (H4 /2)+e), it is 
an infinite-variance distribution, and if the distribution is in MDA(H(1/4)+<), it is a 
distribution with infinite fourth moment. 


Example 5.9 (Pareto distribution). In Example 5.6 we verified by direct calcula- 
tion that normalized maxima of iid Pareto variates converge to a Fréchet distribution. 
Observe that the tail of the Pareto df in (A.19) may be written F (x) = x-*L(x), 
where it may be easily checked that L(x) = (x7! + x7!)~® is a slowly varying 
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function; indeed, as x — ov, L(x) converges to the constant x“. Thus we verify 
that the Pareto df has the form (5.3). 


Further examples of distributions giving rise to the Fréchet limit for maxima 
include the Fréchet distribution itself and the inverse gamma, Student t, loggamma, 
F and Burr distributions. We will provide further demonstrations for some of these 
distributions in Section 16.1.1. 


The Gumbel case. The characterization of distributions in this class is more com- 
plicated than in the Fréchet class. We have seen in Example 5.5 that the exponential 
distribution is in the Gumbel class and, more generally, it could be said that the 
distributions in this class have tails that have an essentially exponential decay. A 
positive-valued rv with a df in MDA (Ho) has finite moments of any positive order, 
Le. E(X*) < œ for every k > 0 (EKM, p. 148). 

However, there is a great deal of variety in the tails of distributions in this class, 
so, for example, both the normal and lognormal distributions belong to the Gumbel 
class (EKM, pp. 145-147). The normal distribution, as discussed in Section 6.1.4, is 
thin tailed, but the lognormal distribution has much heavier tails, and we would need 
to collect a lot of data from the lognormal distribution before we could distinguish 
its tail behaviour from that of a distribution in the Fréchet class. Moreover, it should 
be noted that the right endpoints of distributions in this class satisfy xp < 00, so 
the case xp < œ is possible. 

In financial modelling it is often erroneously assumed that the only interesting 
models for financial returns are the power-tailed distributions of the Fréchet class. 
The Gumbel class is also interesting because it contains many distributions with 
much heavier tails than the normal, even if these are not regularly varying power 
tails. Examples are hyperbolic and generalized hyperbolic distributions (with the 
exception of the special boundary case that is Student fr). 

Other distributions in MDA(Hp) include the gamma, chi-squared, standard 
Weibull (to be distinguished from the Weibull special case of the GEV distribu- 
tion) and Benktander type I and II distributions (which are popular actuarial loss 
distributions), and the Gumbel itself. We provide demonstrations for some of these 
examples in Section 16.1.2. 


The Weibull case. This is perhaps the least important case for financial modelling, 
at least in the area of market risk, since the distributions in this class all have finite 
right endpoints. Although all potential financial and insurance losses are, in practice, 
bounded, we will still tend to favour models that have infinite support for loss 
modelling. An exception may be in the area of credit risk modelling, where we will 
see in Chapter 10 that probability distributions on the unit interval [0, 1] are very 
useful. A characterization of the Weibull class is as follows. 


Theorem 5.10 (Weibull MDA, Gnedenko). For < 0, 


F € MDA(H;) <> xp < œ and F(xp — x7!) = x" Ë L(x) 


for some function L € Ro. 
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It can be shown (EKM, p. 137) that a beta distribution with density fo, as 
given in (A.10) is in MDA(H_1/g). This includes the special case of the uniform 
distribution for 6 = a = 1. 


5.1.3 Maxima of Strictly Stationary Time Series 


The standard theory of the previous sections concerns maxima of iid sequences. 
With financial time series in mind, we now look briefly at the theory for maxima 
of strictly stationary time series and find that the same types of limiting distribution 
apply. 

In this section let (X;);-z denote a strictly stationary time series with sta- 
tionary distribution F, and let (X;);cz denote the associated iid process, i.e. a 


strict white noise process with the same df F. Let M, = max(X),..., Xn) and 
M, = max(X1,..., Xn) denote maxima of the original series and the iid series, 
respectively. 


For many processes (X;);en, it may be shown that there exists a real number 6 
in (0, 1] such that 


Mt, — 
lim p(= < x) = H(x) (5.4) 
Cn 


n—> o0 


for a non-degenerate limit H (x) if and only if 


n—> oo Cn 


lim p(“=" < x) = H? (x). (5.5) 


For such processes this value 6 is known as the extremal index of the process (not to be 
confused with the tail index of distributions in the Fréchet class). A formal definition 
is more technical (see Notes and Comments) but the basic ideas behind (5.4) and (5.5) 
are easily explained. 

For processes with an extremal index, normalized maxima converge in distribution 
provided that maxima of the associated iid process converge in distribution: that is, 
provided the underlying distribution F is in MDA (H£) for some £. Moreover, since 
HË (x) can be easily verified to be a distribution of the same type as Hs (x), the 
limiting distribution of the normalized maxima of the dependent series is a GEV 
distribution with exactly the same £ parameter as the limit for the associated iid 
data; only the location and scaling of the distribution may change. 

Writing u = cnx + dn, we observe that, for large enough n, (5.4) and (5.5) imply 
that 

P(Mn < u) © P° (Mn < u) = F" (u), (5.6) 


so that for u large, the probability distribution of the maximum of n observations 
from the time series with extremal index 6 can be approximated by the distribution 
of the maximum of n < n observations from the associated iid series. In a sense, 
n@ can be thought of as counting the number of roughly independent clusters of 
observations in n observations, and 0 is often interpreted as the reciprocal of the 
mean cluster size. 
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Table 5.1. Approximate values of the extremal index as a function of 
the parameter a; for the ARCH(1) process in (4.22). 


a, 01 0.3 0.5 0.7 0.9 
0 0.999 0.939 0.835 0.721 0.612 


Not every strictly stationary process has an extremal index (see p. 418 of EKM 
for a counterexample) but, for the kinds of time-series processes that interest us in 
financial modelling, an extremal index generally exists. Essentially, we only have 
to distinguish between the cases when 0 = | and the cases when 6 < 1: for the 
former, there is no tendency to cluster at high levels, and large sample maxima from 
the time series behave exactly like maxima from similarly sized iid samples; for the 
latter, we must be aware of a tendency for extreme values to cluster. 


e Strict white noise processes (iid rvs) have extremal index 0 = 1. 


e ARMA processes with Gaussian strict white noise innovations have 0 = 1 
(EKM, pp. 216-218). However, if the innovation distribution is in MDA (H£) 
for £ > 0, then @ < 1 (EKM, pp. 415, 416). 


e ARCH and GARCH processes have 0 < 1 (EKM, pp. 476-480). 


The final fact is particularly relevant to our financial applications, since we saw 
in Chapter 4 that ARCH and GARCH processes provide good models for many 
financial return series. 


Example 5.11 (the extremal index of the ARCH(1) process). In Table 5.1 we 
reproduce some results from de Haan et al. (1989), who calculate approximate values 
for the extremal index of the ARCH(1) process (see Definition 4.16) using a Monte 
Carlo simulation approach. Clearly, the stronger the ARCH effect (that is, the larger 
the magnitude of the parameter 1), the greater the tendency of the process to cluster. 
For a process with parameter 0.9, the extremal index value 6 = 0.612 is interpreted 
as suggesting that the average cluster size is 1/0 = 1.64. 


5.1.4 The Block Maxima Method 


Fitting the GEV distribution. Suppose we have data from an unknown underlying 
distribution F, which we suppose lies in the domain of attraction of an extreme value 
distribution Hz for some &. If the data are realizations of iid variables, or variables 
from a process with an extremal index such as GARCH, the implication of the theory 
is that the true distribution of the n-block maximum M,, can be approximated for 
large enough n by a three-parameter GEV distribution He... 

We make use of this idea by fitting the GEV distribution Hg „,o to data on the n- 
block maximum. Obviously we need repeated observations of an n-block maximum, 
and we assume that the data can be divided into m blocks of size n. This makes most 
sense when there are natural ways of blocking the data. The method has its origins in 
hydrology, where, for example, daily measurements of water levels might be divided 
into yearly blocks and the yearly maxima collected. Analogously, we will consider 
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financial applications where daily return data (recorded on trading days) are divided 
into yearly (or semesterly or quarterly) blocks and the maximum daily falls within 
these blocks are analysed. 

We denote the block maximum of the jth block by Mnj, so our data are 
Ma1, .--; Mnm.The GEV distribution can be fitted using various methods, including 
maximum likelihood. An alternative is the method of probability-weighted moments 
(see Notes and Comments). In implementing maximum likelihood it will be assumed 
that the block size n is quite large so that, regardless of whether the underlying data 
are dependent or not, the block maxima observations can be taken to be independent. 
In this case, writing Ae „o for the density of the GEV distribution, the log-likelihood 
is easily calculated to be 


lg, U, O; Mni, Pie Mum) 


m 
= Ñ In he yo (Mni) 


i=1 


1 m Mni — H m Mni — H rae 
= —mIno (oe a ee) 


i=l 


which must be maximized subject to the parameter constraints that o > 0 and 
1+ (Mni — u)/o > O for all i. While this represents an irregular likelihood 
problem, due to the dependence of the parameter space on the values of the data, 
the consistency and asymptotic efficiency of the resulting MLEs can be established 
for the case when £ > -5 using results in Smith (1985). 

In determining the number and size of the blocks (m and n, respectively), a trade- 
off necessarily takes place: roughly speaking, a large value of n leads to a more 
accurate approximation of the block maxima distribution by a GEV distribution and 
a low bias in the parameter estimates; a large value of m gives more block maxima 
data for the ML estimation and leads to a low variance in the parameter estimates. 
Note also that, in the case of dependent data, somewhat larger block sizes than 
are used in the iid case may be advisable; dependence generally has the effect that 
convergence to the GEV distribution is slower, since the effective sample size is n0, 
which is smaller than n. 


Example 5.12 (block maxima analysis of S&P return data). Suppose we turn 
the clock back and imagine it is the early evening of Friday 16 October 1987. An 
unusually turbulent week in the equity markets has seen the S&P 500 index fall 
by 9.12%. On that Friday alone the index is down 5.16% on the previous day, the 
largest one-day fall since 1962. 

We fit the GEV distribution to annual maximum daily percentage falls in value for 
the S&P index. Using data going back to 1960, shown in Figure 5.2, gives us twenty- 
eight observations of the annual maximum fall (including the latest observation from 
the incomplete year 1987). The estimated parameter values are Ê = 0.29, 4 = 2.03 
and ô = 0.72 with standard errors 0.21, 0.16 and 0.14, respectively. Thus the fitted 
distribution is a heavy-tailed Fréchet distribution with an infinite fourth moment, 
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Figure 5.2. (a) S&P percentage returns for the period 1960 to 16 October 1987. (b) Annual 
maxima of daily falls in the index; superimposed is an estimate of the ten-year return level 
with associated 95% confidence interval (dotted lines). (c) Semesterly maxima of daily falls 
in the index; superimposed is an estimate of the 20-semester return level with associated 95% 
confidence interval. See Examples 5.12 and 5.15 for full details. 


suggesting that the underlying distribution is heavy tailed. Note that the standard 
errors imply considerable uncertainty in our analysis, as might be expected with 
only twenty-four observations of maxima. In fact, in a likelihood ratio test of the 
null hypothesis that a Gumbel model fits the data (Ho: E = 0), the null hypothesis 
cannot be rejected. 

To increase the number of blocks we also fit a GEV model to 56 semesterly 
maxima and obtain the parameter estimates Ê = 0.33, # = 1.68 and ô = 0.55 with 
standard errors 0.14, 0.09 and 0.07. This model has an even heavier tail, and the null 
hypothesis that a Gumbel model is adequate is now rejected. 


Return levels and stress losses. The fitted GEV model can be used to estimate two 
related quantities that describe the occurrence of stress events. On the one hand, we 
can estimate the size of a stress event that occurs with prescribed frequency (the 
return-level problem). On the other hand, we can estimate the frequency of a stress 
event that has a prescribed size (the return-period problem). 


Definition 5.13 (return level). Let H denote the df of the true distribution of 
the n-block maximum. The k n-block return level is rnk = q1-1/k(Ħ), i.e. the 
(1 — 1/k)-quantile of H. 


The k n-block return level can be roughly interpreted as that level which is 
exceeded in one out of every k n-blocks on average. For example, the ten-trading- 
year return level r260,10 is that level which is exceeded in one out of every ten years 
on average. (In the notation we assume that every year has 260 trading days, although 
this is only an average and there will be slight differences from year to year.) Using 
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our fitted model we would estimate a return level by 


=H 1 \ g g In{ 1 l = 1 Saf 
aglia) en 


Definition 5.14 (return period). Let H denote the df of the true distribution of 
the n-block maximum. The return period of the event {M, > u} is given by 
kn,u = 1/H (u). 


Observe that the return period ky, is defined in such a way that the kn,u n-block 
return level is u. In other words, in kn,u n-blocks we would expect to observe a 
single block in which the level u was exceeded. If there was a strong tendency for 
the extreme values to cluster, we might expect to see multiple exceedances of the 
level within that block. Assuming that H is the df of a GEV distribution and using 
our fitted model, we would estimate the return period by kn, u=1/ H; DE (u). 

Note that both ’„,ą and kn, u are simple functionals of the ima d parameters 
of the GEV distribution. As well as calculating point estimates for these quantities 
we should give confidence intervals that reflect the error in the parameter estimates 
of the GEV distribution. A good method is to base such confidence intervals on the 
likelihood ratio statistic, as described in Section A.3.5. To do this we reparametrize 
the GEV distribution in terms of the quantity of interest. For example, in the case 
of return level, let ọ = Hg, ol (1 — (1/k)) and parametrize the GEV distribution by 
0 = (ġ, £, 0)’ rather than 0 = (£, u, 0)’. The maximum likelihood estimate of 
is the estimate (5.7), and a confidence interval can be constructed according to the 
method in Section A.3.5 (see (A.28) in particular). 


Example 5.15 (stress losses for S&P return data). We continue Example 5.12 by 
estimating the ten-year return level and the 20-semester return level based on data up 
to 16 October 1987, using (5.7) for the point estimate and the likelihood ratio method 
as described above to get confidence intervals. The point estimate of the ten-year 
return level is 4.3% with a 95% confidence interval of (3.4, 7.3); the point estimate 
of the 20-semester return level is 4.5% with a 95% confidence interval of (3.5, 7.1). 
Clearly, there is some uncertainty about the size of events of this frequency even 
with 28 years or 56 semesters of data. 

The day after the end of our data set, 19 October 1987, was Black Monday. The 
index fell by the unprecedented amount of 20.5% in one day. This event is well 
outside our confidence interval for a ten-year loss. If we were to estimate a 50-year 
return level (an event beyond our experience if we have 28 years of data), then our 
point estimate would be 7.2 with a confidence interval of (4.8, 23.4), so the 1987 
crash lies close to the upper boundary of our confidence interval for a much rarer 
event. But the 28 maxima are really too few to get a reliable estimate for an event 
as rare as the 50-year event. 

If we turn the problem around and attempt to estimate the return period of a 
20.5% loss, the point estimate is 1631 years (i.e. almost a two-millennium event) 
but the 95% confidence interval encompasses everything from 42 years to essentially 
never! The analysis of semesterly maxima gives only moderately more informative 
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results: the point estimate is 1950 semesters; the confidence interval runs from 121 
semesters to 3.0 x 10° semesters. In summary, on 16 October 1987 we simply did 
not have the data to say anything meaningful about an event of this magnitude. 
This illustrates the inherent difficulties of attempting to quantify events beyond our 
empirical experience. 


Notes and Comments 


The main source for this chapter is Embrechts, Kluppelberg and Mikosch (1997) 
(EKM). Further important texts on EVT include Gumbel (1958), Leadbetter, Lind- 
gren and Rootzén (1983), Galambos (1987), Resnick (2008), Falk, Husler and Reiss 
(1994), Reiss and Thomas (1997), de Haan and Ferreira (2000), Coles (2001), Beir- 
lant et al. (2004) and Resnick (2007). 

The forms of the limit law for maxima were first studied by Fisher and Tippett 
(1928). The subject was brought to full mathematical fruition in the fundamental 
papers of Gnedenko (1941, 1943). The concept of the extremal index, which appears 
in the theory of maxima of stationary series, has a long history. The first mathemat- 
ically precise definition seems to have been given by Leadbetter (1983). See also 
Leadbetter, Lindgren and Rootzén (1983) and Smith and Weissman (1994) for more 
details. The theory required to calculate the extremal index of an ARCH(1) process 
(as in Table 5.1) is found in de Haan et al. (1989) and also in EKM (pp. 473-480). 
For the GARCH(1, 1) process, consult Mikosch and Starica (2000). 

A further difficult task is the statistical estimation of the extremal index from time- 
series data under the assumption that these data do indeed come from a process with 
an extremal index. Two general methods known as the blocks and runs methods 
are described in Section 8.1.3 of EKM; these methods go back to work of Hsing 
(1991) and Smith and Weissman (1994). Although the estimators have been used 
in real-world data analyses (see, for example, Davison and Smith 1990), it remains 
true that the extremal index is a very difficult parameter to estimate accurately. 

The maximum likelihood fitting of the GEV distribution is described by Hosking 
(1985) and Hosking, Wallis and Wood (1985). Consistency and asymptotic normal- 
ity can be demonstrated for the case § > —0.5 using results in Smith (1985). An 
alternative method known as probability-weighted moments (PWM) has been pro- 
posed by Hosking, Wallis and Wood (1985) (see also pp. 321-323 of EKM). The 
analysis of block maxima in Examples 5.12 and 5.15 is based on McNeil (1998). 
Analyses of financial data using the block maxima method may also be found in 
Longin (1996), one of the earliest papers to apply EVT methodology to financial 
data. 


5.2 Threshold Exceedances 


The block maxima method discussed in Section 5.1.4 has the major defect that it is 
very wasteful of data; to perform our analyses we retain only the maximum losses in 
large blocks. For this reason it has been largely superseded in practice by methods 
based on threshold exceedances, where we use all the data that exceed a particular 
designated high level. 
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Figure 5.3. (a) Distribution function of the GPD in three cases: the solid line corresponds to 
€ = 0 (exponential); the dotted line to € = 0.5 (a Pareto distribution); and the dashed line to 
é = —0.5 (Pareto type II). The scale parameter £ is equal to 1 in all cases. (b) Corresponding 
densities. 


5.2.1 Generalized Pareto Distribution 


The main distributional model for exceedances over thresholds is the generalized 
Pareto distribution (GPD). 


Definition 5.16 (GPD). The df of the GPD is given by 


1-(1+éx/py-*, & £0, 
Ge px) = (5.8) 

1 — exp(—x/B), &=0, 
where 8 > 0, and x > 0 when é > 0 and0 < x < —ß/E whené < 0. The 
parameters £ and £ are referred to, respectively, as the shape and scale parameters. 


Like the GEV distribution in Definition 5.1, the GPD is generalized in the sense 
that it contains a number of special cases: when € > 0 the df Gg g is that of an 
ordinary Pareto distribution with a = 1/& and k = B/é (see Section A.2.8); when 
é = 0 we have an exponential distribution; when € < 0 we have a short-tailed, 
Pareto type II distribution. Moreover, as in the case of the GEV distribution, for 
fixed x the parametric form is continuous in &, so limg_,9 Gg, g(x) = Go,g (x). The 
df and density of the GPD for various values of € and 6 = 1 are shown in Figure 5.3. 

In terms of domains of attraction we have that Gg g € MDA(H;) for all € € R. 
Note that, for £ > 0 and £ < 0, this assertion follows easily from the characteriza- 
tions in Theorems 5.8 and 5.10. In the heavy-tailed case, € > 0, it may be easily 
verified that E(X") = oo for k > 1/&. The mean of the GPD is defined provided 
E < landis 


E(X) = 8/0 — &). (5.9) 


The role of the GPD in EVT is as a natural model for the excess distribution over 
a high threshold. We define this concept along with the mean excess function, which 
will also play an important role in the theory. 
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Definition 5.17 (excess distribution over threshold u). Let X be an rv with df F. 
The excess distribution over the threshold u has df 


F — F 
RAH he Sie (5.10) 
1— F(u) 
for0 < x < xp — u, where xf < œ is the right endpoint of F. 


Definition 5.18 (mean excess function). The mean excess function of anrv X with 
finite mean is given by 


elu) = E(X —u | X > u). (5.11) 


The excess df F„ describes the distribution of the excess loss over the threshold 
u, given that u is exceeded. The mean excess function e(u) expresses the mean of 
F,, as a function of u. In survival analysis the excess df is more commonly known as 
the residual life df—it expresses the probability that, say, an electrical component 
that has functioned for u units of time fails in the time period (u, u + x]. The mean 
excess function is known as the mean residual life function and gives the expected 
residual lifetime for components with different ages. For the special case of the 
GPD, the excess df and mean excess function are easily calculated. 


Example 5.19 (excess distribution of exponential and GPD). If F is the df of 
an exponential rv, then it is easily verified that F,,(x) = F(x) for all x, which is 
the famous lack-of-memory property of the exponential distribution—the residual 
lifetime of the aforementioned electrical component would be independent of the 
amount of time that component has already survived. More generally, if X has df 
F = Ge g, then, using (5.10), the excess df is easily calculated to be 


Fux) = Ge pw), pU) = P + éu, (5.12) 


where 0 < x < œ if > Oand0 < x < —(6/&) —uif € < 0. The excess 
distribution remains a GPD with the same shape parameter & but with a scaling that 
grows linearly with the threshold u. The mean excess function of the GPD is easily 
calculated from (5.12) and (5.9) to be 

dyal Aken (5.13) 

Pee de 

where 0 <u < wif0<& <landO <u < —6/é if Ẹ < 0. It may be observed 
that the mean excess function is linear in the threshold u, which is a characterizing 
property of the GPD. 


Example 5.19 shows that the GPD has a kind of stability property under the oper- 
ation of calculating excess distributions. We now give a mathematical result that 
shows that the GPD is, in fact, a natural limiting excess distribution for many under- 
lying loss distributions. The result can also be viewed as a characterization theorem 
for the maximum domain of attraction of the GEV distribution. In Section 5.1.2 we 
looked separately at characterizations for each of the three cases € > 0, € = 0 and 
& < 0; the following result offers a global characterization of MDA(H¢) for all & 
in terms of the limiting behaviour of excess distributions over thresholds. 
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Theorem 5.20 (Pickands—Balkema-—de Haan). We can find a (positive-measurable 
function) B(u) such that 

lim sup | Fy (x) — Ge, pay (x)| = 0 

U>XF O<y <xp—u 


if and only if F € MDA(Hz), € € R. 


Thus the distributions for which normalized maxima converge to a GEV distri- 
bution constitute a set of distributions for which the excess distribution converges 
to the GPD as the threshold is raised; moreover, the shape parameter of the limiting 
GPD for the excesses is the same as the shape parameter of the limiting GEV dis- 
tribution for the maxima. We have already stated in Section 5.1.2 that essentially all 
the commonly used continuous distributions of statistics are in MDA (H¢) for some 
£, so Theorem 5.20 proves to be a very widely applicable result that essentially says 
that the GPD is the canonical distribution for modelling excess losses over high 
thresholds. 


5.2.2 Modelling Excess Losses 


We exploit Theorem 5.20 by assuming that we are dealing with a loss distribu- 
tion F €e MDA(H:z) so that, for some suitably chosen high threshold u, we can 
model F, by a generalized Pareto distribution. We formalize this with the following 
assumption. 


Assumption 5.21. Let F be a loss distribution with right endpoint xp and assume 
that for some high threshold u we have F(x) = Gg, g(x) forO < x < xp — u and 
some é € Rand f > 0. 


This is clearly an idealization, since in practice the excess distribution will gen- 
erally not be exactly GPD, but we use Assumption 5.21 to make a number of calcu- 
lations in the following sections. 


The method. Given loss data X1,..., Xn from F, a random number N, will exceed 
our threshold u; it will be convenient to relabel these data X EE X N,- For each 
of these exceedances we calculate the amount Y; = X j — u of the excess loss. We 
wish to estimate the parameters of a GPD model by fitting this distribution to the 
N, excess losses. There are various ways of fitting the GPD, including maximum 
likelihood (ML) and probability-weighted moments (PWM). The former method is 
more commonly used and is easy to implement if the excess data can be assumed 
to be realizations of independent rvs, since the joint density will then be a product 
of marginal GPD densities. 
Writing gg g for the density of the GPD, the log-likelihood may be easily calcu- 

lated to be 

Nu 

In LE, B; Yi, -s Yn) = D> Inge p(¥j) 
j=l 


ee Y; 
=-nmg- (1+4) Dm (1+2), (5.14) 
ax b 
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which must be maximized subject to the parameter constraints that B > 0 and 
1+&Y;/B > 0 for all j. Solving the maximization problem yields a GPD model 
G a for the excess distribution F,,. 


Non-iid data. For insurance or operational risk data the iid assumption is often 
unproblematic, but this is clearly not true for time series of financial returns. If the 
data are serially dependent but show no tendency to give clusters of extreme values, 
then this might suggest that the underlying process has extremal index 0 = 1. In this 
case, asymptotic theory that we summarize in Section 5.3 suggests a limiting model 
for high-level threshold exceedances, in which exceedances occur according to a 
Poisson process and the excess loss amounts are iid generalized Pareto distributed. 
If extremal clustering is present, suggesting an extremal index 6 < 1 (as would 
be consistent with an underlying GARCH process), the assumption of independent 
excess losses is less satisfactory. The easiest approach is to neglect this problem and 
to consider the ML method to be a QML method, where the likelihood is misspecified 
with respect to the serial dependence structure of the data; we follow this course in 
this section. The point estimates should still be reasonable, although standard errors 
may be too small. In Section 5.3 we discuss threshold exceedances in non-iid data 
in more detail. 


Excesses over higher thresholds. From the model we have fitted to the excess 
distribution over u, we can easily infer a model for the excess distribution over any 
higher threshold. We have the following lemma. 


Lemma 5.22. Under Assumption 5.21 it follows that F,(x) = Gz,p+e(v—u) (x) for 
any higher threshold v > u. 


Proof. We use (5.10) and the df of the GPD in (5.8) to infer that 
Futx)  Fuu+(+v—u)) F(u) 


r = Fw) Fu + =) 
= F(x +v—u) _ Ge g(x +v—u) 
a F,(v— u) m Gs p(v — u) 


= Gg p4é6(v—u)(X). 


Thus the excess distribution over higher thresholds remains a GPD with the same 
& parameter but a scaling that grows linearly with the threshold v. Provided that 
g < 1, the mean excess function is given by 

ow) = a _ fv Bau 
Lave Loe Baas 

where u < v < œ if0<&Ẹ <landu<v<u—6/éEifée <0. 

The linearity of the mean excess function (5.15) in v is commonly used as a 
diagnostic for data admitting a GPD model for the excess distribution. It forms 


the basis for the following simple graphical method for choosing an appropriate 
threshold. 


(5.15) 
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Sample mean excess plot. For positive-valued loss data X1,..., Xn we define 
the sample mean excess function to be an empirical estimator of the mean excess 
function in Definition 5.18. The estimator is given by 


tie ack = xia uy 


i=1 {Xx;>v} 


(5.16) 


To study the sample mean excess function we generally construct the mean excess 
plot {(Xi.n, en(Xi.n)): 2 <i < n}, where X;,, denotes the upper (or descending) 
ith order statistic. If the data support a GPD model over a high threshold, then (5.15) 
suggests that this plot should become increasingly “linear” for higher values of v. 
A linear upward trend indicates a GPD model with positive shape parameter £; a 
plot tending towards the horizontal indicates a GPD with approximately zero shape 
parameter, or, in other words, an exponential excess distribution; a linear downward 
trend indicates a GPD with negative shape parameter. 

These are the ideal situations, but in practice some experience is required to read 
mean excess plots. Even for data that are genuinely generalized Pareto distributed, 
the sample mean excess plot is seldom perfectly linear, particularly towards the 
right-hand end, where we are averaging a small number of large excesses. In fact, 
we often omit the final few points from consideration, as they can severely distort 
the picture. If we do see visual evidence that the mean excess plot becomes linear, 
then we might select as our threshold u a value towards the beginning of the linear 
section of the plot (see, in particular, Example 5.24). 


Example 5.23 (Danish fire loss data). The Danish fire insurance data are a well- 
studied set of financial losses that neatly illustrate the basic ideas behind mod- 
elling observations that seem consistent with an iid model. The data set consists of 
2156 fire insurance losses over 1 000000 Danish kroner from 1980 to 1990 inclu- 
sive, expressed in units of 1 000 000 kroner. The loss figure represents a combined 
loss for a building and its contents, as well as in some cases a loss of business 
earnings; the losses are inflation adjusted to reflect 1985 values and are shown in 
Figure 5.4 (a). 

The sample mean excess plot in Figure 5.4 (b) is in fact fairly “linear” over the 
entire range of the losses, and its upward slope leads us to expect that a GPD with 
positive shape parameter & could be fitted to the entire data set. However, there 
is some evidence of a “kink” in the plot below the value 10 and a “straightening 
out” of the plot above this value, so we have chosen to set our threshold at u = 10 
and fit a GPD to excess losses above this threshold, in the hope of obtaining a 
model that is a good fit to the largest of the losses. The ML parameter estimates 
are Ê = 0.50 and B = 7.0 with standard errors 0.14 and 1.1, respectively. Thus the 
model we have fitted is essentially a very heavy-tailed, infinite-variance model. A 
picture of the fitted GPD model for the excess distribution Ê, (x — u) is also given in 
Figure 5.4 (c), superimposed on points plotted at empirical estimates of the excess 
probabilities for each loss; note the good correspondence between the empirical 
estimates and the GPD curve. 
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Figure 5.4. (a) Time-series plot of the Danish data. (b) Sample mean excess plot. 
(c) Empirical distribution of excesses and fitted GPD. See Example 5.23 for full details. 


In insurance we might use the model to estimate the expected size of the insur- 
ance loss, given that it enters a given insurance layer. Thus we can estimate 
the expected loss size given exceedance of the threshold of 10000000 kroner 
or of any other higher threshold by using (5.15) with the appropriate parameter 
estimates. 


Example 5.24 (AT&T weekly loss data). Suppose we have an investment in AT&T 
stock and want to model weekly losses in value using an unconditional approach. If 
X, denotes the weekly log-return, then the percentage loss in value of our position 
over a week is given by L; = 100(1 — exp(X;)), and data on this loss for the 521 
complete weeks in the period 1991—2000 are shown in Figure 5.5 (a). 
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Figure 5.5. (a) Time-series plot of AT&T weekly percentage loss data. (b) Sample mean 
excess plot. (c) Empirical distribution of excesses and fitted GPD. See Example 5.24 for full 
details. 


A sample mean excess plot of the positive loss values is shown in Figure 5.5 (b) 
and this suggests that a threshold can be found above which a GPD approximation to 
the excess distribution should be possible. We have chosen to position the threshold 
at a loss value of 2.75% and this gives 102 exceedances. 

We observed in Section 3.1 that monthly AT&T return data over the period 1993- 
2000 do not appear consistent with a strict white noise hypothesis, so the issue of 
whether excess losses can be modelled as independent is relevant. This issue is taken 
up in Section 5.3 but for the time being we ignore it and implement a standard ML 
approach to estimating the parameters of a GPD model for the excess distribution; 
we obtain the estimates E = 0.22 and B = 2.1 with standard errors 0.13 and 0.34, 
respectively. Thus the model we have fitted is a model that is close to having an 
infinite fourth moment. A picture of the fitted GPD model for the excess distribution 
F(x — u) isalso given in Figure 5.5 (c), superimposed on points plotted at empirical 
estimates of the excess probabilities for each loss. 
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5.2.3 Modelling Tails and Measures of Tail Risk 


In this section we describe how the GPD model for the excess losses is used to 
estimate the tail of the underlying loss distribution F and associated risk measures. 
To make the necessary theoretical calculations we again make Assumption 5.21. 


Tail probabilities and risk measures. We observe firstly that under Assumption 5.21 
we have, for x > u, 
F(x) = P(X > u)P(X >x|X >u) 
= F(u)P(X —u>x—u|X>u) 
= F(u)F,(x — u) 


- x—u\ i 
=Fu(i4+e 2B ) ; (5.17) 


which, if we know F(u), gives us a formula for tail probabilities. This formula 
may be inverted to obtain a high quantile of the underlying distribution, which we 
interpret as a VaR. For a > F(u) we have that VaR is equal to 


VaRe = azu É(() -1) (5.18) 
Be Ee ENN O l i 


Assuming that € < 1, the associated expected shortfall can be calculated easily 
from (2.22) and (5.18). We obtain 


1 1 VaRy  B—éu 

ES, = > | cag ye = (5.19) 
Note that Assumption 5.21 and Lemma 5.22 imply that excess losses above VaRy 
have a GPD distribution satisfying Fyar, = Ge,p+£(VaR, —u)- The expected shortfall 
estimator in (5.19) can also be obtained by adding the mean of this distribution to 
VaRy, i.e. ESy = VaRy +e(VaR,), where e(VaR,) is given in (5.15). Itis interesting 
to look at how the ratio of the two risk measures behaves for large values of the 
quantile probability a. It is easily calculated from (5.18) and (5.19) that 


_ ESq saan te 
lim = 


= 5.20 

a—>1 VaRy 1, E <0, ( ) 
so the shape parameter £ of the GPD effectively determines the ratio when we go 
far enough out into the tail. 


Estimation in practice. We note that, under Assumption 5.21, tail probabilities, 
VaRs and expected shortfalls are all given by formulas of the form g(é, B, F(u)). 
Assuming that we have fitted a GPD to excess losses over a threshold u, as described 
in Section 5.2.2, we estimate these quantities by first replacing € and £ in formu- 
las (5.17)-(5.19) by their estimates. Of course, we also require an estimate of F (u) 
and here we take the simple empirical estimator N, /n. In doing this, we are implicitly 
assuming that there is a sufficient proportion of sample values above the threshold 
u to estimate F (u) reliably. However, we hope to gain over the empirical method by 
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using a kind of extrapolation based on the GPD for more extreme tail probabilities 
and risk measures. For tail probabilities we obtain an estimator, first proposed by 
Smith (1987), of the form 


_ „\-1Ê 
FO) = Tài) ; (5.21) 


which we stress is only valid for x > u. Fora > 1 — N,/n we obtain analogous 
point estimators of VaR, and ES, from (5.18) and (5.19). 

Of course, we would also like to obtain confidence intervals. If we have taken the 
likelihood approach to estimating € and £, then it is quite easy to give confidence 
intervals for gÈ, Ê, N,,/n) that take into account the uncertainty in Ê and B, but 
neglect the uncertainty in N,,/n as an estimator of F(u). We use the approach 
described at the end of Section 5.1.4 for return levels, whereby the GPD model is 
reparametrized in terms of ¢ = g(&, B, N,/n), and a confidence interval for ĝ is 
constructed based on the likelihood ratio test as in Section A.3.5. 


Example 5.25 (risk measures for AT&T loss data). Suppose we have fitted a GPD 
model to excess weekly losses above the threshold u = 2.75%, as in Example 5.24. 
We use this model to obtain estimates of the 99% VaR and expected shortfall of the 
underlying weekly loss distribution. The essence of the method is displayed in Fig- 
ure 5.6; this is a plot of estimated tail probabilities on logarithmic axes, with various 
dotted lines superimposed to indicate the estimation of risk measures and associated 
confidence intervals. The points on the graph are the 102 threshold exceedances and 
are plotted at y-values corresponding to the tail of the empirical distribution function; 
the smooth curve running through the points is the tail estimator (5.21). 

Estimation of the 99% quantile amounts to determining the point of intersection 
of the tail estimation curve and the horizontal line F(x) = 0.01 (not marked on 
the graph); the first vertical dotted line shows the quantile estimate. The horizontal 
dotted line aids in the visualization of a 95% confidence interval for the VaR estimate; 
the degree of confidence is shown on the alternative y-axis to the right of the plot. 
The boundaries of a 95% confidence interval are obtained by determining the two 
points of intersection of this horizontal line with the dotted curve, which is a profile 
likelihood curve for the VaR as a parameter of the GPD model and is constructed 
using likelihood ratio test arguments as in Section A.3.5. Dropping the horizontal 
line to the 99% mark would correspond to constructing a 99% confidence interval 
for the estimate of the 99% VaR. The point estimate and the 95% confidence interval 
for the 99% quantile are estimated to be 11.7% and (9.6, 16.1). 

The second vertical line on the plot shows the point estimate of the 99% expected 
shortfall. A 95% confidence interval is determined from the dotted horizontal line 
and its points of intersection with the second dotted curve. The point estimate and 
the 95% confidence interval are 17.0% and (12.7, 33.6). Note that if we take the 
ratio of the point estimates of the shortfall and the VaR, we get 17/11.7 ~ 1.45, 
which is larger than the asymptotic ratio (1 — E )~! = 1.29 suggested by (5.20); this 
is generally the case at finite levels and is explained by the second term in (5.19) 
being a non-negligible positive quantity. 
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Figure 5.6. The smooth curve through the points shows the estimated tail of the AT&T 
weekly percentage loss data using the estimator (5.21). Points are plotted at empirical tail 
probabilities calculated from empirical df. The vertical dotted lines show estimates of 99% 
VaR and expected shortfall. The other curves are used in the construction of confidence 
intervals. See Example 5.25 for full details. 


Before leaving the topic of GPD tail modelling it is clearly important to see how 
sensitive our risk-measure estimates are to the choice of the threshold. Hitherto, 
we have considered single choices of threshold u and looked at a series of incre- 
mental calculations that always build on the same GPD model for excesses over that 
threshold. We would hope that there is some robustness to our inference for different 
choices of threshold. 


Example 5.26 (varying the threshold). In the case of the AT&T weekly loss 
data the influence of different thresholds is investigated in Figure 5.7. Given the 
importance of the € parameter in determining the weight of the tail and the rela- 
tionship between quantiles and expected shortfalls, we first show how estimates 
of € vary as we consider a series of thresholds that give us between 20 and 150 
exceedances. In fact, the estimates remain fairly constant around a value of approx- 
imately 0.2; a symmetric 95% confidence interval constructed from the standard 
error estimate is also shown, and it indicates how the uncertainty about the parameter 
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Figure 5.7. (a) Estimate of £ for different thresholds u and numbers of exceedances Ny, 
together with a 95% confidence interval based on the standard error. (b) Associated point esti- 
mates of the 99% VaR (solid line) and the expected shortfall (dotted line). See Example 5.26 
for commentary. 


value decreases as the threshold is lowered or the number of threshold exceedances 


is increased. 

Point estimates of the 99% VaR and expected shortfall estimates are also shown. 
The former remain remarkably constant around 12%, while the latter show mod- 
est variability that essentially tracks the variability of the € estimate. These pic- 
tures provide some reassurance that different thresholds do not lead to drasti- 
cally different conclusions. We return to the issue of threshold choice again in 


Section 5.2.5. 


5.2.4 The Hill Method 


The GPD method is not the only way to estimate the tail of a distribution and, as an 
alternative, we describe in this section the well-known Hill approach to modelling 


the tails of heavy-tailed distributions. 
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Estimating the tail index. For this method we assume that the underlying loss 
distribution is in the maximum domain of attraction of the Fréchet distribution so 
that, by Theorem 5.8, it has a tail of the form 


F(x) =x L(x) (5.22) 


for a slowly varying function L (see Definition 5.7) and a positive parameter a. 
Traditionally, in the Hill approach, interest centres on the tail index a, rather than 
its reciprocal £, which appears in (5.3). The goal is to find an estimator of a based 
on identically distributed data X1, ..., Xn. 

The Hill estimator can be derived in various ways (see EKM, pp. 330-336). 
Perhaps the most elegant is to consider the mean excess function of the generic 
logarithmic loss In X, where X is an rv with df (5.22). Writing e* for the mean 
excess function of In X and using integration by parts we find that 


e*(Inu) = E(In X — Inu | In X > Inu) 


1 [0,6] 
zl (nx — Inu) dF(x) 


= 1 | Pa 
F(u) Ju x 


1 [0,6] 
= za! L(x)x7~@F) dx. 
F(u) Ju 


For u sufficiently large, the slowly varying function L(x) for x > u can essentially be 
treated as a constant and taken outside the integral. More formally, using Karamata’s 
Theorem (see Section A.1.4), we get, for u —> ov, 


L —a,—1 
Sn et 
F(u) 
so limy—+oo we* (Inu) = 1. We expect to see similar tail behaviour in the sample 
mean excess function ež (see (5.16)) constructed from the log observations. That 
is, we expect that e% (In Xg n) © a! forn large and k sufficiently small, where 
Xn,n S +++ S X1,n are the order statistics as usual. Evaluating ež (In Xg, n) gives us 
the estimator @~! = ((k — 1)7! pee In X jn — In Xķk,n). The standard form of the 
Hill estimator is obtained by a minor modification: 


k -1 
ay? = G Soin Xjn—In Xan) , 2<k<n. (5.23) 
j=1 
The Hill estimator is one of the best-studied estimators in the EVT literature. The 
asymptotic properties (consistency, asymptotic normality) of this estimator (as sam- 
ple size n — oo, number of extremes k — œo and the so-called tail-fraction 
k/n — 0) have been extensively investigated under various assumed models for the 
data, including ARCH and GARCH (see Notes and Comments). We concentrate on 
the use of the estimator in practice and, in particular, on its performance relative to 
the GPD estimation approach. 
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Figure 5.8. Hill plots showing estimates of the tail index a = 1/€ for (a), (b) the AT&T 
weekly percentages losses and (c), (d) the Danish fire loss data. Parts (b) and (d) are expanded 
versions of sections of (a) and (c) showing Hill estimates based on up to 60 order statistics. 


When the data are from a distribution with a tail that is close to a perfect power 
function, the Hill estimator is often a good estimator of œ, or its reciprocal €. In 
practice, the general strategy is to plot Hill estimates for various values of k. This 
gives the Hill plot {(k, a ): k= 2,...,n}. We hope to find a stable region in the 
Hill plot where estimates constructed from different numbers of order statistics are 
quite similar. 


Example 5.27 (Hill plots). We construct Hill plots for the Danish fire data of Exam- 
ple 5.23 and the weekly percentage loss data (positive values only) of Example 5.24 
(shown in Figure 5.8). 

It is very easy to construct the Hill plot for all possible values of k, but it can be 
misleading to do so; practical experience (see Example 5.28) suggests that the best 
choices of k are relatively small: say, 10-50 order statistics in a sample of size 1000. 
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For this reason we have enlarged sections of the Hill plots showing the estimates 
obtained for values of k less than 60. 

For the Danish data, the estimates of œ obtained are between 1.5 and 2, suggesting 
& estimates between 0.5 and 0.67, all of which correspond to infinite-variance models 
for these data. Recall that the estimate derived from our GPD model in Example 5.23 
was Ê = 0.50. For the AT&T data, there is no particularly stable region in the plot. 
The « estimates based on k = 2,..., 60 order statistics mostly range from 2 to 4, 
suggesting a € value in the range 0.25—0.5, which is larger than the values estimated 
in Example 5.26 with a GPD model. 


Example 5.27 shows that the interpretation of Hill plots can be difficult. In prac- 
tice, various deviations from the ideal situation can occur. If the data do not come 
from a distribution with a regularly varying tail, the Hill method is really not appro- 
priate and Hill plots can be very misleading. Serial dependence in the data can 
also spoil the performance of the estimator, although this is also true for the GPD 
estimator. EKM contains a number of Hill “horror plots” based on simulated data 
illustrating the issues that arise (see Notes and Comments). 


Hill-based tail estimates. For the risk-management applications of this book we 
are less concerned with estimating the tail index of heavy-tailed data and more 
concerned with tail and risk-measure estimates. We give a heuristic argument for a 
standard tail estimator based on the Hill approach. We assume a tail model of the 
form F(x) = Cx“, x > u > 0, for some high threshold u; in other words, we 
replace the slowly varying function by a constant for sufficiently large x. For an 
appropriate value of k the tail index «œ is estimated by ay and the threshold u is 
replaced by Xx, (or X(x+1),n in Some versions); it remains to estimate C. Since 
C can be written as C = u“ F (u), this is equivalent to estimating F (u), and the 
obvious empirical estimator is k/n (or (k — 1)/n in some versions). Putting these 
ideas together gives us the Hill tail estimator in its standard form: 


^ kð x \ Mn 
F(x) = =( ) >, X2Xkn. (5.24) 


Writing the estimator in this way emphasizes the way it is treated mathematically. For 
any pair k and n, both the Hill estimator and the associated tail estimator are treated 
as functions of the k upper-order statistics from the sample of size n. Obviously, it is 
possible to invert this estimator to get a quantile estimator and it is also possible to 
devise an estimator of expected shortfall using arguments about regularly varying 
tails. 

The GPD-based tail estimator (5.21) is usually treated as a function of a random 
number N, of upper-order statistics for a fixed threshold u. The different presentation 
of these estimators in the literature is a matter of convention and we can easily recast 
both estimators in a similar form. Suppose we rewrite (5.24) in the notation of (5.21) 
by substituting ED, u and N, for 1/ a. Xk,n and k, respectively. We get 


-1/E®) 

A N, x = 

Pay = “(1490 *) i 
n EMy 
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Figure 5.9. Comparison of (a) estimated MSE, (b) bias and (c) variance for the Hill (dotted 
line) and GPD (solid line) estimators of £, the reciprocal of the tail index, as a function of k 
(or Nu), the number of upper-order statistics from a sample of 1000 t-distributed data with 
four degrees of freedom. See Example 5.28 for details. 


This estimator lacks the additional scaling parameter £ in (5.21) and tends not to 
perform as well, as is shown in simulated examples in the next section. 


5.2.5 Simulation Study of EVT Quantile Estimators 


First we consider estimation of € and then estimation of the high quantile VaRg. In 
both cases estimators are compared using mean squared errors (MSEs); we recall 
that the MSE of an estimator 6 of a parameter 0 is given by MSE(6) =E (6 —0} = 
(E (6 — 6) + var(6), and thus has the well-known decomposition into squared 
bias plus variance. A good estimator should keep both the bias term E (ô — 0) and 
the variance term var (ô ) small. 

Since analytical evaluation of bias and variance is not possible, we calculate Monte 
Carlo estimates by simulating 1000 data sets in each experiment. The parameters 
of the GPD are determined in all cases by ML; PWM, the main alternative, gives 
slightly different results, but the conclusions are similar. 

We calculate estimates using the Hill method and the GPD method based on dif- 
ferent numbers of upper-order statistics (or differing thresholds) and try to determine 
the choice of k (or N,,) that is most appropriate for a sample of size n. In the case 
of estimating VaR we also compare the EVT estimators with the simple empirical 
quantile estimator. 


Example 5.28 (Monte Carlo experiment). We assume that we have a sample 
of 1000 iid data from a ¢ distribution with four degrees of freedom and want to 
estimate £, the reciprocal of the tail index, which in this case has the true value 0.25. 
(This is demonstrated in Example 16.1.) The Hill estimate is constructed for k values 
in the range {2, ... , 200}, and the GPD estimate is constructed for k (or N,,) values 
in {30, 40, 50, ... , 400}. The results are shown in Figure 5.9. 
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Figure 5.10. Comparison of (a) estimated MSE, (b) bias and (c) variance for the Hill 

(dotted line) and GPD (solid line) estimators of VaRg 99, as a function of k (or Nyu), the 

number of upper-order statistics from a sample of 1000 t-distributed data with four degrees 


of freedom. The dashed line also shows results for the (threshold-independent) empirical 
quantile estimator. See Example 5.28 for details. 


The ¢ distribution has a well-behaved regularly varying tail and the Hill estima- 
tor gives better estimates of £ than the GPD method, with an optimal value of k 
around 20-30. The variance plot shows where the Hill method gains over the GPD 
method; the variance of the GPD-based estimator is much higher than that of the 
Hill estimator for small numbers of order statistics. The magnitudes of the biases 
are closer together, with the Hill method tending to overestimate € and the GPD 
method tending to underestimate it. If we were to use the GPD method, the optimal 
choice of threshold would be one giving 100-150 exceedances. 

The conclusions change when we attempt to estimate the 99% VaR; the results are 
shown in Figure 5.10. The Hill method has a negative bias for low values of k but a 
rapidly growing positive bias for larger values of k; the GPD estimator has a positive 
bias that grows much more slowly; the empirical method has a negative bias. The 
GPD attains its lowest MSE value for a value of k around 100, but, more importantly, 
the MSE is very robust to the choice of k because of the slow growth of the bias. 
The Hill method performs well for 20 < k < 75 (we only use k values that lead to a 
quantile estimate beyond the effective threshold Xz,n) but then deteriorates rapidly. 
Both EVT methods obviously outperform the empirical quantile estimator. Given 
the relative robustness of the GPD-based tail estimator to changes in k, the issue of 
threshold choice for this estimator seems less critical than for the Hill method. 


5.2.6 Conditional EVT for Financial Time Series 

The GPD method when applied to threshold exceedances in a financial return series 
(as in Examples 5.24 and 5.25) gives us risk-measure estimates for the stationary (or 
unconditional) distribution of the underlying time series. We now consider a simple 
adaptation of the GPD method that allows us to obtain risk-measure estimates for the 
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conditional (or one-step-ahead forecast) distribution of a time series. This adaptation 
uses the GARCH model and related ideas in Chapter 4 and will be applied in the 
market-risk context in Chapter 9. 

We use the notation developed for prediction or forecasting in Sections 4.1.5 
and 4.2.5. Let X;-n+1,..., X; denote a series of negative log-returns and assume 
that these come from a strictly stationary time-series process (X;). Assume further 
that the process satisfies equations of the form X; = u: + o;Z;, where ur and o; 
are ¥;_,-measurable and (Z;) are iid innovations with some unknown df Fz; an 
example would be an ARMA model with GARCH errors. 

We want to obtain VaR and expected shortfall estimates for the conditional distri- 
bution Fy,,,|¥,, and in Section 4.2.5 we showed that these risk measures are given 
by the equations 


VaR! = Ut+1 + Ot+14a (Z), ES, = Mi+1 + 0741 ESa (Z), 


where we write Z for a generic rv with df Fz. 

These equations suggest an estimation method as follows. We first fit an ARMA- 
GARCH model by the QML procedure of Section 4.2.4 (since we do not wish to 
assume a particular innovation distribution) and use this to estimate j4;41 and o;+1. 
As an alternative we could use EWMA volatility forecasting. To estimate gy (Z) and 
ES, (Z) we essentially apply the GPD tail estimation procedure to the innovation 
distribution Fz. To get round the problem that we do not observe data directly from 
the innovation distribution, we treat the residuals from the GARCH analysis as our 
data and apply the GPD tail estimation method of Section 5.2.3 to the residuals. 
In particular, we estimate qa (Z) and ES,(Z) using the VaR and expected shortfall 
formulas in (5.18) and (5.19). 


Notes and Comments 


The ideas behind the important Theorem 5.20, which underlies GPD modelling, may 
be found in Pickands (1975) and Balkema and de Haan (1974). Important papers 
developing the technique in the statistical literature are Davison (1984) and Davison 
and Smith (1990). The estimation of the parameters of the GPD, both by ML and by 
the method of probability-weighted moments, is discussed in Hosking and Wallis 
(1987). The tail estimation formula (5.21) was suggested by Smith (1987), and the 
theoretical properties of this estimator for iid data in the domain of attraction of an 
extreme value distribution are extensively investigated in that paper. The Danish fire 
loss example is taken from McNeil (1997). 

The Hill estimator goes back to Hill (1975) (see also Hall 1982). The theoretical 
properties for dependent data, including linear processes with heavy-tailed innova- 
tions and ARCH and GARCH processes, were investigated by Resnick and Starica 
(1995, 1996). The idea of smoothing the estimator is examined in Resnick and 
Starica (1997) and Resnick (1997). For Hill “horror plots”, showing situations when 
the Hill estimator delivers particularly poor estimates of the tail index, see pp. 194, 
270 and 343 of EKM. 
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Alternative estimators based on order statistics include the estimator of Pickands 
(1975), which is also discussed in Dekkers and de Haan (1989), and the DEdH 
estimator of Dekkers, Einmahl and de Haan (1989). This latter estimator is used as 
the basis of a quantile estimator in de Haan and Rootzén (1993). Both the Pickands 
and DEdH estimators are designed to estimate general £ in the extreme value limit (in 
contrast to the Hill estimator, which is designed for positive €); in empirical studies 
the DEdH estimator seems to work better than the Pickands estimator. The issue 
of the optimal number of order statistics in such estimators is taken up in a series 
of papers by Dekkers and de Haan (1993) and Daníelsson et al. (2001a). A method 
is proposed that is essentially based on the bootstrap approach to estimating mean 
squared error discussed in Hall (1990). A review paper relevant for applications to 
insurance and finance is Matthys and Beirlant (2000). 

Analyses of the tails of financial data using methods based on the Hill estimator can 
be found in Koedijk, Schafgans and de Vries (1990), Lux (1996) and various papers 
by Danielsson and de Vries (1997a,b,c). The conditional EVT method was developed 
in McNeil and Frey (2000); a Monte Carlo method using the GPD model to estimate 
risk measures for the h-day loss distribution is also described. See also Gengay, 
Selcuk and Ulugülyağci (2003), Gençay and Selçuk (2004) and Chavez-Demoulin, 
Embrechts and Sardy (2014) for interesting applications of EVT methodology to 
financial time series and VaR estimation. 


5.3 Point Process Models 


In our discussion of threshold models in Section 5.2 we considered only the magni- 
tude of excess losses over high thresholds. In this section we consider exceedances 
of thresholds as events in time and use a point process approach to model the occur- 
rence of these events. We begin by looking at the case of regularly spaced iid data 
and discuss the well-known POT model for the occurrence of extremes in such data; 
this model elegantly subsumes the models for maxima and the GPD models for 
excess losses that we have so far described. 

However, the assumptions of the standard POT model are typically violated by 
financial return series, because of the kind of serial dependence that volatility clus- 
tering generates in such data. Our ultimate aim is to find more general point process 
models to describe the occurrence of extreme values in financial time series, and we 
find suitable candidates in the class of self-exciting point processes. These models 
are of a dynamic nature and can be used to estimate conditional VaRs; they offer an 
interesting alternative to the conditional EVT approach of Section 5.2.6, with the 
advantage that no pre-whitening of data with GARCH processes is required. 

The following section gives an idea of the theory behind the POT model, but it 
may be skipped by readers who are content to go directly to a description of the 
standard POT model in Section 5.3.2. 


5.3.1 Threshold Exceedances for Strict White Noise 


Consider a strict white noise process (X;);-N representing financial losses. While we 
discuss the theory for iid variables for simplicity, the results we describe also hold for 
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dependent processes with extremal index 0 = 1, i.e. processes where extreme values 
show no tendency to cluster (see Section 5.1.3 for examples of such processes). 
Throughout this section we assume that the common loss distribution is in the 
maximum domain of attraction of an extreme value distribution (MDA(H; )) so 
that (5.1) holds for the non-degenerate limiting distribution Hg and normalizing 
sequences cn and d,. From (5.1) it follows, by taking logarithms, that for any fixed 
x we have 
im, nln F(cyx + dn) = In Ag (x). (5.25) 


Throughout this section we also consider a sequence of thresholds (u,(x)) defined 
by un (x) := cnx +d, for some fixed value of x. By noting that — lIn y ~ 1 — y as 
y — 1, we can infer from (5.25) that nF (up(x)) ~ —nIn F (u, (x)) > — 1n H; (x) 
as n — œ for this sequence of thresholds. 

The number of losses in the sample X1, ..., Xn exceeding the threshold un (x) is 
a binomial rv, Nu,œ) ~ B(n, F(un(x))), with expectation n F (uy (x)). Since (5.25) 
holds, the standard Poisson limit result implies that, as n — ov, the number 
of exceedances N,,,(~) converges to a Poisson rv with mean A(x) = — In Hg (x), 
depending on the particular value of x chosen. 

The theory goes further. Not only is the number of exceedances asymptotically 
Poisson, these exceedances occur according to a Poisson point process. To state the 
result it is useful to give a brief summary of some ideas concerning point processes. 


On point processes. Suppose we have a sequence of rvs or vectors Yj,..., Yn 
taking values in some state space X (for example, R or RÊ?) and we define, for any 
set A C X, the rv 


n 
N(A) = È Iya): (5.26) 
i=l 


which counts the random number of Y; in the set A. Under some technical conditions 
(see EKM, pp. 220-223), (5.26) is said to define a point process N(-). An example 
of a point process is the Poisson point process. 


Definition 5.29 (Poisson point process). The point process N (-) is called a Poisson 
point process (or Poisson random measure) on X with intensity measure A if the 
following two conditions are satisfied. 


(a) For A C X andk > 0, 


A(A)k 
= ACA) Nae 
P(N(A) =k) = E a 
0, A(A) = œ. 
(b) For any m > 1, if A1, ..., Am are mutually disjoint subsets of X, then the 


tvs N(A1),..., N(Am) are independent. 


The intensity measure A(-) of N(-) is also known as the mean measure because 
E(N(A)) = A(A). We also speak of the intensity function (or simply inten- 
sity) of the process, which is the derivative A(x) of the measure satisfying 
A(A) = fa A(x) dx. 
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Asymptotic behaviour of the point process of exceedances. Consider again the strict 
white noise (X;);cn and the sequence of thresholds un(x) = cnx + dn for some 
fixed x. For n € N and 1 <i < n let Y; „n = (/n)Iyx; su, (x); and observe that Y; n 
can be thought of as returning either the normalized “time” i/n of an exceedance, 
or zero. The point process of exceedances of the threshold un is the process Ny (-) 
with state space X = (0, 1], which is given by 


n 
N, (A) = 5 Ty; ,€A} (5.27) 
i=l 


for A C X. As the notation indicates, we consider this process to be an element 
in a sequence of point processes indexed by n. The point process (5.27) counts 
the exceedances with time of occurrence in the set A, and we are interested in the 
behaviour of this process as n —> oo. 

It may be shown (see Theorem 5.3.2 in EKM) that N, (-) converges in distribution 
on X to a Poisson point process N (-) with intensity measure A(-) satisfying A(A) = 
(t2—t A(x) for A = (t1, t2) C X%, where A(x) = — In Hg (x) as before. This implies, 
in particular, that E(N,(A)) > E(N(A)) = A(A) = (t — t1)A(x). Clearly, the 
intensity does not depend on time and takes the constant value A := A(x); we refer 
to the limiting process as a homogeneous Poisson process with intensity or rate A. 


Application of the result in practice. We give a heuristic argument explaining how 
this limiting result is used in practice. We consider a fixed large sample size n and a 
fixed high threshold u, which we assume satisfies u = cny + dn for some value y. 
We expect that the number of threshold exceedances can be approximated by a 
Poisson rv and that the point process of exceedances of u can be approximated by a 
homogeneous Poisson process with rate A = — In Hg (y) = — In He ((u—dn)/cn). If 
we replace the normalizing constants c, and d, by o > 0 and u, we have a Poisson 
process with rate — In Hg |, 5 (u). Clearly, we could repeat the same argument with 
any high threshold so that, for example, we would expect it to be approximately true 
that exceedances of the level x > u occur according to a Poisson process with rate 
— In Ag yo (x). 

We therefore have an intimate relationship between the GEV model for block 
maxima and a Poisson model for the occurrence in time of exceedances of a high 
threshold. The arguments of this section therefore provide theoretical support for 
the observation in Figure 3.3: that exceedances for simulated iid t data are separated 
by waiting times that behave like iid exponential observations. 


5.3.2 The POT Model 


The theory of the previous section combined with the theory of Section 5.2 suggests 
an asymptotic model for threshold exceedances in regularly spaced iid data (or data 
from a process with extremal index 0 = 1). The so-called POT model makes the 
following assumptions. 
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e Exceedances occur according to a homogeneous Poisson process in time. 


e Excess amounts above the threshold are iid and independent of exceedance 
times. 


e The distribution of excess amounts is generalized Pareto. 


There are various alternative ways of describing this model. It might also be called a 
marked Poisson point process, where the exceedance times constitute the points 
and the GPD-distributed excesses are the marks. It can also be described as a 
(non-homogeneous) two-dimensional Poisson point process, where points (t, x) 
in two-dimensional space record times and magnitudes of exceedances. The latter 
representation is particularly powerful, as we now discuss. 


Two-dimensional Poisson formulation of POT model. Assume that we have reg- 
ularly spaced random losses X1,..., Xn and that we set a high threshold u. We 
assume that, on the state space X = (0, 1] x (u, 00), the point process defined by 
N(A) = 377, [(i/n,x;)eA} is a Poisson process with intensity at a point (t, x) given 
by 


x= 2 a 


A(t, x) = H(i + Far (5.28) 


provided (1+&(x—)/o) > 0, and by A(t, x) = 0 otherwise. Note that this intensity 
does not depend on ¢ but does depend on x, and hence the two-dimensional Poisson 
process is non-homogeneous; we simplify the notation to A(x) := A(t, x). For a set 
of the form A = (t1, t2) x (x, 00) C X, the intensity measure is 


t2 [06] 
A(A) = Í f A(y) dy dt = — (t — tı) In Hg n,o (x). (5.29) 
ti x 


It follows from (5.29) that for any x > u, the implied one-dimensional process of 
exceedances of the level x is a homogeneous Poisson process with rate t(x) := 
— In He uo (x). Now consider the excess amounts over the threshold u. The tail of 
the excess df over the threshold u, denoted by F, (x) before, can be calculated as 
the ratio of the rates of exceeding the levels u + x and u. We obtain 


- o tuUtx) Ex Me - 
HOS a) Oe 


for a positive scaling parameter 8B = o + &(u — m). This is precisely the tail of 
the GPD model for excesses over the threshold u used in Section 5.2.2. Thus this 
seemingly complicated model is indeed the POT model described informally at the 
beginning of this section. 

Note also that the model implies the GEV distributional model for maxima. To 
see this, consider the event that {M@, < x} for some value x > u. This may be 
expressed in point process language as the event that there are no points in the set 
A = (0, 1] x (x, œœ). The probability of this event is calculated to be P(M, < x) = 
P(N(A) = 0) =e 44 = He „o (x), x > u, which is precisely the GEV model 
for maxima of n-blocks used in Section 5.1.4. 
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Statistical estimation of the POT model. The most elegant way of fitting the POT 
model to data is to fit the point process with intensity (5.28) to the exceedance data 


in one step. Given the exceedance data {XxX j: J=1,..., Nu}, the likelihood can be 
written as 
Nu 
LG, o, m; Ši, o, Šu) =e" [J A(X). (5.30) 
j=l 


Parameter estimates of £, o and u are obtained by maximizing this expression, 
which is easily accomplished by numerical means. For literature on the derivation 
of this likelihood, see Notes and Comments. 

There are, however, simpler ways of getting the same parameter estimates. Sup- 
pose we reparametrize the POT model in terms of t := t(u) = — ln Hg p,o (U), 
the rate of the one-dimensional Poisson process of exceedances of the level u, and 
B =o +&(u — un), the scaling parameter of the implied GPD for the excess losses 
over u. The intensity in (5.28) can then be rewritten as 


T x—u\ 1E! 
À =X(t,x)= —( 1+ ; 5.31 
(x) (t, x) =( 3 3 ) (5.31) 


where € € R and t, 6 > 0. Using this parametrization it is easily verified that the 
log of the likelihood in (5.30) becomes 


In LE, 0, n; X1,...,Xn,) = ln Lig, b; Xi —u,..., Xn, — u) + ln Lot; Ny), 


where L4 is precisely the likelihood for fitting a GPD to excess losses given in (5.14), 
andIn L2 (T; Nu) = —T + N, Int, which is the log-likelihood for a one-dimensional 
homogeneous Poisson process with rate t. Such a partition of a log-likelihood into a 
sum of two terms involving two different sets of parameters means that we can make 
separate inferences about the two sets of parameters; we can estimate £ and £ ina 
GPD analysis and then estimate t by its MLE N, and use these to infer estimates 
of u and o. 


Advantages of the POT model formulation. One might ask what the advantages of 
approaching the modelling of extremes through the two-dimensional Poisson point 
process model described by the intensity (5.28) could be? One advantage is the fact 
that the parameters £, u and o in the Poisson point process model do not have any 
theoretical dependence on the threshold chosen, unlike the parameter £ in the GPD 
model, which appears in the theory as a function of the threshold u. In practice, we 
would expect the estimated parameters of the Poisson model to be roughly stable 
over a range of high thresholds, whereas the estimated parameter varies with 
threshold choice. 

For this reason the intensity (5.28) is a framework that is often used to introduce 
covariate effects into extreme value modelling. One method of doing this is to replace 
the parameters u and o in (5.28) by parameters that vary over time as a function 
of deterministic covariates. For example, we might have u(t) = œ + y’ y(t), where 
y(t) represents a vector of covariate values at time t. This would give us Poisson 
processes that are also non-homogeneous in time. 
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Applicability of the POT model to return series data. We now turn to the use of 
the POT model with financial return data. An initial comment is that returns do 
not really form genuine point events in time, in contrast to recorded water levels 
or wind speeds, for example. Returns are discrete-time measurements that describe 
changes in value taking place over the course of, say, a day or a week. Nonetheless, 
we assume that if we take a longer-term perspective, such data can be approximated 
by point events in time. 

In Section 3.1 and in Figure 3.3 in particular, we saw evidence that, in contrast 
to iid data, exceedances of a high threshold for daily financial return series do not 
necessarily occur according to a homogeneous Poisson process. They tend instead 
to form clusters corresponding to episodes of high volatility. Thus the standard POT 
model is not directly applicable to financial return data. 

Theory suggests that for stochastic processes with extremal index 6 < 1, such 
as GARCH processes, the extremal clusters themselves should occur according to 
a homogeneous Poisson process in time, so that the individual exceedances occur 
according to a Poisson cluster process (see, for example, Leadbetter 1991). A suitable 
model for the occurrence and magnitude of exceedances in a financial return series 
might therefore be some form of marked Poisson cluster process. 

Rather than attempting to specify the mechanics of cluster formation, it is quite 
common to try to circumvent the problem by declustering financial return data: we 
attempt to formally identify clusters of exceedances and then we apply the POT 
model to cluster maxima only. This method is obviously somewhat ad hoc, as there 
is usually no clear way of deciding where one cluster ends and another begins. A 
possible declustering algorithm is given by the runs method. In this method a run 
size r is fixed and two successive exceedances are said to belong to two different 
clusters if they are separated by a run of at least r values below the threshold (see 
EKM, pp. 422-424). In Figure 5.11 the DAX daily negative returns of Figure 3.3 
have been declustered with a run length of ten trading days; this reduces the 100 
exceedances to 42 cluster maxima. 

However, it is not clear that applying the POT model to declustered data gives us 
a particularly useful model. We can estimate the rate of occurrence of clusters of 
extremes and say something about average cluster size; we can also derive a GPD 
model for excess losses over thresholds for cluster maxima (where standard errors 
for parameters may be more realistic than if we fitted the GPD to the dependent 
sample of all threshold exceedances). However, by neglecting the modelling of 
cluster formation, we cannot make more dynamic statements about the intensity 
of occurrence of threshold exceedances at any point in time. In Section 16.2 we 
describe self-exciting point process models, which do attempt to model the dynamics 
of cluster formation. 


Example 5.30 (POT analysis of AT&T weekly losses). We close this section with 
an example of a standard POT model applied to extremes in financial return data. To 
mitigate the clustering phenomenon discussed above we use weekly return data, as 
previously analysed in Examples 5.24 and 5.25. Recall that these yield 102 weekly 
percentage losses for the AT&T stock price exceeding a threshold of 2.75%. The 
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Figure5.11. (a) DAX daily negative returns and a Q-Q plot of their spacings as in Figure 3.3. 
(b) Data have been declustered with the runs method using a run length of ten trading days. 
The spacings of the 42 cluster maxima are more consistent with a Poisson model. 


data are shown in Figure 5.12, where we observe that the inter-exceedance times 
seem to have a roughly exponential distribution, although the discrete nature of the 
times and the relatively low value of n means that there are some tied values for the 
spacings, which makes the plot look a little granular. Another noticeable feature is 
that the exceedances of the threshold appear to become more frequent over time, 
which might be taken as evidence against the homogeneous Poisson assumption for 
threshold exceedances and against the implicit assumption that the underlying data 
form a realization from a stationary time series. It would be possible to consider a 
POT model incorporating a trend of increasingly frequent exceedances, but we will 
not go this far. 

We fit the standard two-dimensional Poisson model to the 102 exceedances of 
the threshold 2.75% using the likelihood in (5.30) and obtain parameter estimates 
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Figure 5.12. (a) Time series of AT&T weekly percentage losses from 1991 to 2000. (b) Cor- 
responding realization of the marked point process of exceedances of the threshold 2.75%. 
(c) Q-Q plot of inter-exceedance times against an exponential reference distribution. See 
Example 5.30 for details. 


A 


E = 0.22, À = 19.9 and ô = 5.95. The implied GPD scale parameter for the dis- 
tribution of excess losses over the threshold u is B =o+ Ê (u — ft) = 2.1, so we 
have exactly the same estimates of £ and £ as in Example 5.24. 

The estimated exceedance rate for the threshold u = 2.75 is given by ĉ(u) = 
—lnH E aê (u) = 102, which is precisely the number of exceedances of that thresh- 
old, as theory suggests. It is of more interest to look at estimated exceedance rates 
for higher thresholds. For example, we get t (15) = 2.50, which implies that losses 
exceeding 15% occur as a Poisson process with rate 2.5 losses per ten-year period, 
so that such a loss is, roughly speaking, a four-year event. Thus the Poisson model 
gives us an alternative method of defining the return period of a stress event and a 
more powerful way of calculating such a risk measure. Similarly, we can invert the 
problem to estimate return levels: suppose we define the ten-year return level as that 
level which is exceeded according to a Poisson process with rate one loss per ten 
years, then we can easily estimate the level in our model by calculating 


so the ten-year event is a weekly loss of roughly 20%. Using the profile likelihood 
method in Section A.3.5 we could also give confidence intervals for such estimates. 
Notes and Comments 


For more information about point processes consult EKM, Cox and Isham (1980), 
Kallenberg (1983) and Resnick (2008). The point process approach to extremes 
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dates back to Pickands (1971) and is also discussed in Leadbetter, Lindgren and 
Rootzén (1983), Leadbetter (1991) and Falk, Husler and Reiss (1994). 

The two-dimensional Poisson point process model was first used in practice by 
Smith (1989) and may also be found in Smith and Shively (1995); both these papers 
discuss the adaptation of the point process model to incorporate covariates or time 
trends in the context of environmental data. An insurance application is treated 
in Smith and Goodman (2000), which also treats the point process model from a 
Bayesian perspective. An interesting application to wind storm losses is Rootzén 
and Tajvidi (1997). A further application of the bivariate point process framework 
to the modeling of insurance loss data, showing trends in both intensity and sever- 
ity of occurrence, is found in McNeil and Saladin (2000). For further applica- 
tions to insurance and finance, see Chavez-Demoulin and Embrechts (2004) and 
Chavez-Demoulin, Embrechts and Hofert (2014). An excellent overview of statis- 
tical approaches to the GPD and point process models is found in Coles (2001). 

The derivation of likelihoods for point process models is beyond the scope of this 
book and we have simply recorded the likelihoods to be maximized without further 
justification. See Daley and Vere-Jones (2003, Chapter 7) for more details on this 
subject; see also Coles (2001, p. 127) for a good intuitive account in the Poisson 
case. 


6 


Multivariate Models 


Financial risk models, whether for market or credit risks, are inherently multivariate. 
The value change of a portfolio of traded instruments over a fixed time horizon 
depends on a random vector of risk-factor changes or returns. The loss incurred by 
a credit portfolio depends on a random vector of losses for the individual counter- 
parties in the portfolio. 

This chapter is the first of two successive ones that focus on models for random 
vectors. The emphasis in this chapter is on tractable models that describe both the 
individual behaviour of the components of a random vector and their joint behaviour 
or dependence structure. We consider a number of distributions that extend the 
multivariate normal but provide more realistic models for many kinds of financial 
data. 

In Chapter 7 we focus explicitly on modelling the dependence structure of a ran- 
dom vector and largely ignore marginal behaviour. We introduce copula models of 
dependence and study a number of dependence measures and concepts related to 
copulas. Both Chapter 6 and Chapter 7 take a static, distributional view of multi- 
variate modelling; for multivariate time-series models, see Chapter 14. 

Section 6.1 reviews basic ideas in multivariate statistics and discusses the multi- 
variate normal (or Gaussian) distribution and its deficiencies as a model for empirical 
return data. 

In Section 6.2 we consider a generalization of the multivariate normal distribution 
known as a multivariate normal mixture distribution, which shares much of the 
structure of the multivariate normal and retains many of its properties. We treat 
both variance mixtures, which belong to the wider class of elliptical distributions, 
and mean-variance mixtures, which allow asymmetry. Concrete examples include 
t distributions and generalized hyperbolic distributions, and we show in empirical 
examples that these models provide a better fit than a Gaussian distribution to asset 
return data. In some cases, multivariate return data are not strongly asymmetric and 
models from the class of elliptical distributions are good enough; in Section 6.3 we 
investigate the elegant properties of these distributions. 

In Section 6.4 we discuss the important issue of dimension-reduction techniques 
for reducing large sets of risk factors to smaller subsets of essential risk drivers. The 
key idea here is that of a factor model, and we also review the principal components 
method of constructing factors. 
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6.1 Basics of Multivariate Modelling 


This first section reviews important basic material from multivariate statistics, which 
will be known to many readers. The main topic of the section is the multivariate 
normal distribution and its properties; this distribution is central to much of classical 
multivariate analysis and was the starting point for attempts to model market risk 
(the variance—covariance method of Section 9.2.2). 


6.1.1 Random Vectors and Their Distributions 


Joint and marginal distributions. Consider a general d-dimensional random vector 
of risk-factor changes (or so-called returns) X = (X1, ..., Xq)’. The distribution 
of X is completely described by the joint distribution function (df) 


Fy (x) = Fx (x1, ...,%¢) = P(X <x) = P(X] SXi, Xa S Xa). 


Where no ambiguity arises we simply write F, omitting the subscript. 
The marginal df of X;, written Fy, or often simply F;, is the df of that risk factor 
considered individually and is easily calculated from the joint df. For all i we have 


Fj (xj) = P(X; < xj) = F(W,..., 00, Xi, ©, ..., 00). (6.1) 


If the marginal df F; (x) is absolutely continuous, then we refer to its derivative f; (x) 
as the marginal density of X;. It is also possible to define k-dimensional marginal 
distributions of X for 2 < k < d — 1. Suppose we partition X into (X{, X45)’, where 
Xı = (X1,..., Xx)’ and X2 = (Xx41,..., Xa)’, then the marginal df of X; is 


Fy, (x1) = P(X) < x1) = F(X, ..., Xk, ©, ..., 0O). 


For bivariate and other low-dimensional margins it is convenient to have a sim- 
pler alternative notation in which, for example, F;;(x;, xj) stands for the marginal 
distribution of the components X; and Xj. 

The df of a random vector X is said to be absolutely continuous if 


X1 Xd 
Pacea] -f f(u1,..., uq) du1 -+ duq 
—0O —00O 


for some non-negative function f, which is then known as the joint density of X. 
Note that the existence of a joint density implies the existence of marginal densities 
for all k-dimensional marginals. However, the existence of a joint density is not 
necessarily implied by the existence of marginal densities (counterexamples can be 
found in Chapter 7 on copulas). 

In some situations it is convenient to work with the survival function of X, defined 
by 


Fy (x) = Fy(x1,...,xg) = P(X > x) = P(X, > x1,..., Xa > Xa) 


and written simply as F when no ambiguity arises. The marginal survival function 
of X;, written Fy, or often simply F;, is given by 


F;(x;) = P(X; > xi) = F(—o0,..., —00, xj, —00,..., —00). 


6.1. Basics of Multivariate Modelling 175 


Conditional distributions and independence. If we have a multivariate model for 
risks in the form of a joint df, survival function or density, then we have implicitly 
described the dependence structure of the risks. We can make conditional probability 
statements about the probability that certain components take certain values given 
that other components take other values. For example, consider again our partition of 
X into (X1, X⁄ and assume absolute continuity of the df of X. Let fx, denote the 
joint density of the k-dimensional marginal distribution F'y,. Then the conditional 
distribution of X2 given X; = x, has density 


f (x1, x2) 
xax (X2 | x1) = ———, (6.2) 
a fx, 1) 
and the corresponding df is 
Fyx5|x, (x2 | x1) 
a af POM ans MMi eyes Ha) fe er 
Ug41=—0O uq=—00 fx (xı) i l 


If the joint density of X factorizes into f(x) = fx, (x1) fx, (x2), then the con- 
ditional distribution and density of X2 given X; = x, are identical to the marginal 
distribution and density of X2: in other words, X; and X2 are independent. We recall 
that X; and X> are independent if and only if 


or, in the case where X possesses a joint density, f(x) = fx, (x1) fx, (x2). 
The components of X are mutually independent if and only if F (x) = Wt ii) 
for all x € Rf or, in the case where X possesses a density, f(x) = ML, fix). 


Moments and characteristic function. The mean vector of X, when it exists, is 
given by 
E(X) := (E(X)),..., E(Xa))’. 


The covariance matrix, when it exists, is the matrix cov(X) defined by 
cov(X) := E(X — E(X))(X — E(X))’), 


where the expectation operator acts componentwise on matrices. If we write X for 
cov(X), then the (i, j)th element of this matrix is 


oij = cov(X;, Xj) = E(X; Xj) — E(X)E(Xj), 


the ordinary pairwise covariance between X; and Xj. The diagonal elements 
O11,---, Odd are the variances of the components of X. 

The correlation matrix of X, denoted by p(X), can be defined by introducing a 
standardized vector Y such that Y; = X;/./var(X;) for all i and taking p(X) := 
cov(Y). If we write P for p(X), then the (i, j)th element of this matrix is 


cov(X;, Xj) 


aX) var (X; 


Pij = p(Xi, Xj) = (6.3) 
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the ordinary pairwise linear correlation of X; and X ;. To express the relationship 
between correlation and covariance matrices in matrix form, it is useful to introduce 
operators on a covariance matrix X as follows: 


A(x) := diag(./on, ..., /Gad)s (6.4) 
(2) := (A(S) EAD}. (6.5) 


Thus A(X) extracts from X a diagonal matrix of standard deviations, and (X) 
extracts a correlation matrix. The covariance and correlation matrices X and P of 
X are related by 

P = o (5X). (6.6) 


Mean vectors and covariance matrices are manipulated extremely easily under 
linear operations on the vector X. For any matrix B € R**¢ and vector b € R% we 
have 


E(BX +b) = BE(X) +b, (6.7) 
cov(BX + b) = Bcov(X)B’. (6.8) 


Covariance matrices (and hence correlation matrices) are therefore positive semi- 
definite; writing X for cov(X) we see that (6.8) implies that var(a’X) = a’ Xa > 0 
for any a € Rf. If we have that a’ Da > 0 for any a € R? \ {0}, we say that X 
is positive definite; in this case the matrix is invertible. We will make use of the 
well-known Cholesky factorization of positive-definite covariance matrices at many 
points; it is well known that such a matrix can be written as X = AA’ for a lower 
triangular matrix A with positive diagonal elements. The matrix A is known as the 
Cholesky factor. It will be convenient to denote this factor by X 1/2 and its inverse by 
X- !/?, Note that there are other ways of defining the “square root” of a symmetric 
positive-definite matrix (such as the symmetric decomposition), but we will always 
use ¥!/? to denote the Cholesky factor. 

In this chapter many properties of the multivariate distribution of a vector X are 
demonstrated using the characteristic function, which is given by 


x(t) = Ee**) = Ee), teR?. 
6.1.2 Standard Estimators of Covariance and Correlation 


Suppose we have n observations of a d-dimensional risk-factor return vector denoted 
by X1,..., Xn. Typically, these would be daily, weekly, monthly or yearly obser- 
vations forming a multivariate time series. We will assume throughout this chapter 
that the observations are identically distributed in the window of observation and 
either independent or at least serially uncorrelated (also known as multivariate white 
noise). As discussed in Chapter 3, the assumption of independence may be roughly 
tenable for longer time intervals such as months or years. For shorter time intervals 
independence may be a less appropriate assumption (due to a phenomenon known 
as volatility clustering, discussed in Section 3.1.1), but serial correlation of returns 
is often quite weak. 


6.1. Basics of Multivariate Modelling 177 


We assume that the observations X;,..., Xn come from a distribution with mean 
vector p, finite covariance matrix X and correlation matrix P. We now briefly review 
the standard estimators of these vector and matrix parameters. 

Standard method-of-moments estimators of u and X are given by the sample 
mean vector X and the sample covariance matrix S. These are defined by 


oY ie Ce A 2 
X :=- Xi, S:=—) (Xi —X)(X; — XY, (6.9) 
rž rž 
where arithmetic operations on vectors and matrices are performed componentwise. 
X is an unbiased estimator but S is biased; an unbiased version may be obtained by 
taking S,, := nS/(n — 1), as may be seen by calculating 


mO = E( out w)(X; — py! —n(X — p(X w) 


i=l 


n 
= X cov(X;) —n cov(X) =n — X, 
i=l 

since cov(X ) =n! 5 when the data vectors are iid, or identically distributed and 
uncorrelated. 

The sample correlation matrix R may be easily calculated from the sample covari- 
ance matrix; its (j, k)th element is given by rjx = Sjk/4/SjjSkk, Where sj, denotes 
the (j, k)th element of S. Or, using the notation introduced in (6.5), we have 


R= (S), 


which is the analogous equation to (6.6) for estimators. 

Further properties of the estimators X, S and R will very much depend on the true 
multivariate distribution of the observations. These quantities are not necessarily 
the best estimators of the corresponding theoretical quantities in all situations. This 
point is often forgotten in financial risk management, where sample covariance 
and correlation matrices are routinely calculated and interpreted with little critical 
consideration of underlying models. 

If our data X;,..., X, are iid multivariate normal, then X and S are the maximum 
likelihood estimators (MLEs) of the mean vector p and covariance matrix X. Their 
behaviour as estimators is well understood, and statistical inference for the model 
parameters is described in all standard texts on multivariate analysis. 

However, the multivariate normal is certainly not a good description of financial 
risk-factor returns over short time intervals, such as daily data, and is often not good 
over longer time intervals either. Under these circumstances the behaviour of the 
standard estimators in (6.9) is often less well understood, and other estimators of 
the true mean vector u and covariance matrix X may perform better in terms of 
efficiency and robustness. Roughly speaking, by a more efficient estimator we mean 
an estimator with a smaller expected estimation error; by a more robust estimator 
we mean an estimator whose performance is not so susceptible to the presence of 
outlying data values. 
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6.1.3. The Multivariate Normal Distribution 


Definition 6.1. X = (X),..., X4) has a multivariate normal or Gaussian distri- 
bution if 

X Ê u+AZ, 
where Z = (Z1, ..., Zx)’ is a vector of iid univariate standard normal rvs (mean 0 


and variance 1), and A € R*k and we IR¢@ are a matrix and a vector of constants, 
respectively. 


It is easy to verify, using (6.7) and (6.8), that the mean vector of this distribution is 
E(X) = mand the covariance matrix is cov(X) = X, where X = AA’ is a positive- 
semidefinite matrix. Moreover, using the fact that the characteristic function of a 
standard univariate normal variate Z is @z7(t) = e~' Af 2 the characteristic function 
of X may be calculated to be 


x(t) = E(X) = exp(it'w — 14’ 5t), te R?. (6.10) 


Clearly, the distribution is characterized by its mean vector and covariance matrix, 
and hence a standard notation is X ~ Na(m, X). Note that the components of X are 
mutually independent if and only if X is diagonal. For example, X ~ Nq(0, I4) if 
and only if X1,..., Xq are iid N (0, 1), the standard univariate normal distribution. 

We concentrate on the non-singular case of the multivariate normal when 
rank(A) = d < k. In this case the covariance matrix X has full rank d and is 
therefore invertible (non-singular) and positive definite. Moreover, X has an abso- 
lutely continuous distribution function with joint density given by 


f(x) exp(—s(@— wy EZ '(x—p)}, xeR, (611) 


1 
= (27) 4/2| 5 |1/2 
where |X| denotes the determinant of X. 

The form of the density clearly shows that points with equal density lie on ellip- 
soids determined by equations of the form (x — uy X me 4) = c, for constants 
c > 0. In two dimensions the contours of equal density are ellipses, as illustrated 
in Figure 6.1. Whenever a multivariate density f(x) depends on x only through 
the quadratic form (x — uy X -lx — 44), it is the density of a so-called elliptical 
distribution, as discussed in more detail in Section 6.3. 

Definition 6.1 is essentially a simulation recipe for the multivariate normal dis- 
tribution. To be explicit, if we wished to generate a vector X with distribution 
Na(“, X), where X is positive definite, we would use the following algorithm. 


Algorithm 6.2 (simulation of multivariate normal distribution). 


(1) Perform a Cholesky decomposition of X (see, for example, Press et al. 1992) 
to obtain the Cholesky factor X 1/2. 


(2) Generate a vector Z = (Z,,..., Za)’ of independent standard normal vari- 
ates. 


(3) Set X = u + XPZ. 
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Figure 6.1. (a) Perspective and contour plots for the density of a bivariate normal distri- 
bution with standard normal margins and correlation —70%. (b) Corresponding plots for a 
bivariate ¢ density with four degrees of freedom (see Example 6.7 for details) and the same 
mean vector and covariance matrix as the normal distribution. Contour lines are plotted at 
the same heights for both densities. 


We now summarize further useful properties of the multivariate normal. These 
properties underline the attractiveness of the multivariate normal for computational 
work in risk management. Note, however, that many of them are in fact shared by 
the broader classes of normal mixture distributions and elliptical distributions (see 
Section 6.3.3 for properties of the latter). 


Linear combinations. If we take linear combinations of multivariate normal ran- 
dom vectors, then these remain multivariate normal. Let X ~ Na(m, X) and take 
any B € R**¢ and b € RÝ. Then it is easily shown (e.g. using the characteristic 
function (6.10)) that 


BX+b~ N;(Bu +b, BIB’). (6.12) 
As a special case, if a € R4, then 
a'X ~ N(a'p, a’ Xa), (6.13) 


and this fact is used routinely in the variance—covariance approach to risk manage- 
ment, as discussed in Section 9.2.2. 

In this context it is interesting to note the following elegant characterization of 
multivariate normality. It is easily shown using characteristic functions that X is 
multivariate normal if and only if a’X is univariate normal for all vectors a € 


R? \ {0}. 
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Marginal distributions. It is clear from (6.13) that the univariate marginal distri- 
butions of X must be univariate normal. More generally, using the X = (X1, X4)’ 
notation from Section 6.1.1 and extending this notation naturally to u and X, 


X X 
u= Mi l ye 11 12 , 
W2 22 Xn 


property (6.12) implies that the marginal distributions of X; and X2 are also multi- 
variate normal and are given by X; ~ Ng(M1, X11) and X2 ~ Ng—x (M2, X22). 


Conditional distributions. Assuming that X is positive definite, the conditional 
distributions of X2 given X; and of X; given X2 may also be shown to be multivariate 
normal. For example, X2 | X1 = x, ~ Ng—x (2.1, 22.1), where 


H21 = M2 + Da Dp (x — Hı) and X21 = Xn — Da X Sn 
are the conditional mean vector and covariance matrix. 
Quadratic forms. If X ~ Na(m, X) with X positive definite, then 
(X — py D(X — u) ~ xå» (6.14) 


a chi-squared distribution with d degrees of freedom. This is seen by observing that 
Z = X71/?(X — u) ~ N4(0, Ia) and (X — p)' XT! (X — p) = Z'Z ~ x3. This 
property (6.14) is useful for checking multivariate normality (see Section 6.1.4). 


Convolutions. If X and Y are independent d-dimensional random vectors satisfy- 
ing X ~ Na(l n, X) and Y ~ Na (h, X), then we may take the product of charac- 
teristic functions to show that X + Y ~ Na(wt+ A, Y + X). 


6.1.4 Testing Multivariate Normality 


We now consider the issue of testing whether the data X1, ..., Xn are observations 
from a multivariate normal distribution. 


Univariate tests. If X1,..., Xn are iid multivariate normal, then for 1 < j < d 
the univariate sample X1, j, . . ., Xn, j consisting of the observations of the jth com- 
ponent must be iid univariate normal; in fact, any univariate sample constructed 
from a linear combination of the data of the form a' X1, ..., a’X,, must be iid uni- 
variate normal. This can be assessed graphically with a Q-Q plot against a standard 
normal reference distribution, or it can be tested formally using one of the many 
numerical tests of normality (see Section 3.1.2 for more details of univariate tests 
of normality). 


Multivariate tests. To test for multivariate normality it is not sufficient to test that 
the univariate margins of the distribution are normal. We will see in Chapter 7 that 
it is possible to have multivariate distributions with normal margins that are not 
themselves multivariate normal distributions. Thus we also need to be able to test 
joint normality, and a simple way of doing this is to exploit the fact that the quadratic 
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form in (6.14) has a chi-squared distribution. Suppose we estimate u and X using 
the standard estimators in (6.9) and construct the data 


{D? = (X; — X)'S71(X; — Š): i =1,...,n}. (6.15) 


Because the estimates of the mean vector and the covariance matrix are used in the 
construction of each D?, these data are not independent, even if the original X; data 
were. Moreover, the marginal distribution of D? under the null hypothesis is not 
exactly chi-squared; in fact, we have that n(n — 1)? D? ~ Beta(4d, s(n —d-1)), 
so that the true distribution is a scaled beta distribution, although it turns out to be 
very close to chi-squared for large n. We expect D?,..., D2 to behave roughly like 
an iid sample from a xa distribution, and for simplicity we construct Q-Q plots 
against this distribution. (It is also possible to make Q-Q plots against the beta 
reference distribution, and these look very similar.) 

Numerical tests of multivariate normality based on multivariate measures of skew- 
ness and kurtosis are also possible. Suppose we define, in analogy to (3.1), 


n n 


n 
ba= >) D}, ka= 5 D (6.16) 
= 


i=l j=1 


where D; is given in (6.15) and is known as the Mahalanobis distance between 
X; and X, and Djj = (X; — X)S~!(Xj; — X) is known as the Mahalanobis angle 
between X; — X and X; — X. Under the null hypothesis of multivariate normality 
the asymptotic distributions of these statistics as n — oo are 


A ka —d(d +2) 
anbg ~ /8d(d + 2)/n 
69d X Xd(d+i)(d+2)/6° 8d(d + 2)/n 


Mardia’s test of multinormality involves comparing the skewness and kurtosis statis- 
tics with the above theoretical reference distributions. Since large values of the 
statistics cast doubt on the multivariate normal model, one-sided tests are generally 
performed. Usually, the tests of kurtosis and skewness are performed separately, 
although there are also a number of joint (or so-called omnibus) tests (see Notes and 
Comments). 


~ N(O, 1). (6.17) 


Example 6.3 (on the normality of returns on Dow Jones 30 stocks). In Sec- 
tion 3.1.2 we applied univariate tests of normality to an arbitrary subgroup of ten 
stocks from the Dow Jones index. We took eight years of data spanning the period 
1993-2000 and formed daily, weekly, monthly and quarterly logarithmic returns. In 
this example we apply Mardia’s tests of multinormality based on both multivariate 
skewness and kurtosis to the multivariate data for all ten stocks. The results are 
shown in Table 6.1. We also compare the D? data (6.15) toa Xin distribution using 
a Q-Q plot (see Figure 6.2). 

The daily, weekly and monthly return data fail the multivariate tests of normal- 
ity. For quarterly return data the multivariate kurtosis test does not reject the null 
hypothesis but the skewness test does; the Q-Q plot in Figure 6.2 (d) looks slightly 
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Table 6.1. Mardia’s tests of multivariate normality based on the multivariate measures of 
skewness and kurtosis in (6.16) and the asymptotic distributions in (6.17) (see Example 6.3 
for details). 


Daily Weekly Monthly Quarterly 


n 2020 416 96 32 
bio 9.31 9.91 21.10 50.10 
p-value 0.00 0.00 0.00 0.02 
kio 242.45 177.04 142.65 120.83 
p-value 0.00 0.00 0.00 0.44 
250 4 : 
(a) (b) 
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Figure 6.2. Q-Q plot of the D? data in (6.15) against a Xo distribution for the data sets of 
Example 6.3: (a) daily analysis, (b) weekly analysis, (c) monthly analysis and (d) quarterly 
analysis. Under the null hypothesis of multivariate normality these should be roughly linear. 


more linear. There is therefore some evidence that returns over a quarter year are 
close to being normally distributed, which might indicate a central limit theorem 
effect taking place, although the sample size is too small to reach any more reliable 
conclusion. The evidence against the multivariate normal distribution is certainly 
overwhelming for daily, weekly and monthly data. 


The results in Example 6.3 are fairly typical for financial return data. This suggests 
that in many risk-management applications the multivariate normal distribution is 
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not a good description of reality. It has three main defects, all of which are discussed 
at various points in this book. 


(1) The tails of its univariate marginal distributions are too thin; they do not assign 
enough weight to extreme events (see also Section 3.1.2). 


(2) The joint tails of the distribution do not assign enough weight to joint extreme 
outcomes (see also Section 7.3.1). 


(3) The distribution has a strong form of symmetry, known as elliptical symmetry. 


In the next section we look at models that address some of these defects. We con- 
sider normal variance mixture models, which share the elliptical symmetry of the 
multivariate normal but have the flexibility to address (1) and (2) above; we also 
look at normal mean—variance mixture models, which introduce some asymmetry 
and thus address (3). 


Notes and Comments 


Much of the material covered briefly in Section 6.1 can be found in greater detail 
in standard texts on multivariate statistical analysis such as Mardia, Kent and Bibby 
(1979), Seber (1984), Giri (1996) and Johnson and Wichern (2002). 

The true distribution of D? = (X; — X)S~!(X; — X) for iid Gaussian data was 
shown by Gnanadesikan and Kettenring (1972) to be a scaled beta distribution (see 
also Gnanadesikan 1997). The implications of this fact for the construction of Q- 
Q plots in small samples are considered by Small (1978). References for multivariate 
measures of skewness and kurtosis and Mardia’s test of multinormality are Mardia 
(1970, 1974, 1975). See also Mardia, Kent and Bibby (1979), the entry on “multi- 
variate normality, testing for” in Volume 6 of the Encyclopedia of Statistical Sciences 
(Kotz, Johnson and Read 1985), and the entry on “Mardia’s test of multinormality” 
in Volume 5 of the same publication. A paper that compares the performance of 
different goodness-of-fit tests for the multivariate normal distribution implemented 
in R is Joenssen and Vogel (2014). 


6.2 Normal Mixture Distributions 


In this section we generalize the multivariate normal to obtain multivariate normal 
mixture distributions. The crucial idea is the introduction of randomness into first 
the covariance matrix and then the mean vector of a multivariate normal distribution 
via a positive mixing variable, which will be known throughout as W. 


6.2.1 Normal Variance Mixtures 


Definition 6.4. The random vector X is said to have a (multivariate) normal variance 
mixture distribution if 


X É nt+JWAz, (6.18) 
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where 
© Z~ NO, Ik); 
(ii) W > 0 is a non-negative, scalar-valued rv that is independent of Z, and 


(iii) A € R¢** and u € Rf are a matrix and a vector of constants, respectively. 


Such distributions are known as variance mixtures, since if we condition on the rv 
W, we observe that X | W = w ~ Na(u, wX), where X = AA’. The distribution 
of X can be thought of as a composite distribution constructed by taking a set of 
multivariate normal distributions with the same mean vector and with the same 
covariance matrix up to a multiplicative constant w. The mixture distribution is 
constructed by drawing randomly from this set of component multivariate normals 
according to a set of “weights” determined by the distribution of W; the resulting 
mixture is not itself a multivariate normal distribution. In the context of modelling 
risk-factor returns, the mixing variable W could be interpreted as a shock that arises 
from new information and impacts the volatilities of all risk factors. 

As for the multivariate normal, we are most interested in the case where rank(A) = 
d < k and X isa full-rank, positive-definite matrix; this will give us a non-singular 
normal variance mixture. 

Provided that W has a finite expectation, we may easily calculate that 


E(X) = E(ut+J/WAZ) = w+ E(VW)AE(Z) = 
and that 
cov(X) = E((VWAZ)(/WAZ)’) = E(W)AE(ZZ')A' = E(W)Z. (6.19) 


We generally refer to u and X as the location vector and the dispersion matrix of 
the distribution. Note that X (the covariance matrix of AZ) is only the covariance 
matrix of X if E(W) = 1, and that m is only the mean vector when E (X) is defined, 
which requires E( W!/*) < oo. The correlation matrices of X and AZ are the same 
when E(W) < oo. Note also that these distributions provide good examples of 
models where a lack of correlation does not necessarily imply independence of the 
components of X; indeed, we have the following simple result. 


Lemma 6.5. Let (X1, X2) have a normal mixture distribution with A = h and 
E(W) < œ so that cov(X 1, X2) = 0. Then X; and X3 are independent if and only 
if W is almost surely constant, i.e. (X1, X2) are normally distributed. 


Proof. If W is almost surely a constant, then (X1, X2) have a bivariate normal 
distribution and are independent. Conversely, if (X1, X2) are independent, then we 
must have E'(|X,||X2|) = E(|X1|) E(|X2|). We calculate that 


E(\X1||Xa|) = E(W|Z\||Zal) = E(W)E (Zi) E(Zal) 
> (EVW)y EZINE (Z2) = EX1 DE(X2)), 


and we can only have equality throughout when W is a constant. 
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Using (6.10), we can calculate that the characteristic function of a normal variance 
mixture is given by 


px (t) = E(E(e* | W)) = E(explit'u — Swe’ Xt) 
= ef Ade St), (6.20) 


where H (0) = ee e7™®Y dH(v) is the Laplace-Stieltjes transform of the df H of 
W. Based on (6.20) we use the notation X ~ Ma( m, X, H) for normal variance 
mixtures. 

Assuming that X is positive definite and that the distribution of W has no point 
mass at zero, we may derive the joint density of a normal variance mixture distribu- 
tion. Writing fx|w for the (Gaussian) conditional density of X given W, the density 


of X is given by 


f(x) = | rwa | w) dH (w) 


B wre 3 (x — py’ XT! (x — u) 
=J] Qn se P 2w 


| dH(w), (6.21) 


in terms of the Lebesgue-Stieltjes integral; when H has density A we simply mean 
the Riemann integral Va Sx\w(x | w)h(w) dw. All such densities will depend on 
x only through the quadratic form (x — y)/~!(x — m), and this means they are 
the densities of elliptical distributions, as will be discussed in Section 6.3. 


Example 6.6 (multivariate two-point normal mixture distribution). Simple 
examples of normal mixtures are obtained when W is a discrete rv. For example, the 
two-point normal mixture model is obtained by taking W in (6.18) to be a discrete 
rv that assumes the distinct positive values kı and k2 with probabilities p and | — p, 
respectively. By setting kz large relative to kı and choosing p large, this distribution 
might be used to define two regimes: an ordinary regime that holds most of the time 
and a stress regime that occurs with small probability 1 — p. Obviously this idea 
extends to k-point mixture models. 


Example 6.7 (multivariate ¢ distribution). If we take W in (6.18) to be an rv with 
an inverse gamma distribution W ~ Ig(5v, Jv) (which is equivalent to saying that 
v/W ~ x2, then X has a multivariate ¢ distribution with v degrees of freedom 
(see Section A.2.6 for more details concerning the inverse gamma distribution). Our 
notation for the multivariate t is X ~ tg(v, p, X), and we note that X is not the 
covariance matrix of X in this definition of the multivariate t. Since E(W) = v/(v— 
2) we have cov(X) = (v/(v — 2)) X, and the covariance matrix (and correlation 
matrix) of this distribution is only defined if v > 2. 

Using (6.21), the density can be calculated to be 


row +d)) ( (x — py r(x —p) 7 
= : 22 
1O = FAAA v nee 


186 6. Multivariate Models 


Clearly, the locus of points with equal density is again an ellipsoid with equation 
(x -uy al (x — a) = c for some c > 0. A bivariate example with four degrees 
of freedom is given in Figure 6.1. In comparison with the multivariate normal, the 
contours of equal density rise more quickly in the centre of the distribution and 
decay more gradually on the “lower slopes” of the distribution. In comparison with 
the multivariate normal, the multivariate t has heavier marginal tails (as discussed in 
Section 5.1.2) and a more pronounced tendency to generate simultaneous extreme 
values (see also Section 7.3.1). 


Example 6.8 (symmetric generalized hyperbolic distribution). A flexible family 
of normal variance mixtures is obtained by taking W in (6.18) to have a generalized 
inverse Gaussian (GIG) distribution, W ~ N7 (A, x, Y) (see Section A.2.5). Using 
(6.21), it can be shown that a normal variance mixture constructed with this mixing 
density has the joint density 


VIDI Kiam V + E- W/E 1 - WW) 

(2m) DIKAN) Mx Fe YETTE — yD | 

(6.23) 
where K, denotes a modified Bessel function of the third kind (see Section A.2.5 
for more details). This distribution is a special case of the more general family of 
multivariate generalized hyperbolic distributions, which we will discuss in greater 
detail in Section 6.2.2. The more general family can be obtained as mean-variance 
mixtures of normals, which are not necessarily elliptical distributions. 

The GIG mixing distribution is very flexible and contains the gamma and inverse 
gamma distributions as special boundary cases (corresponding, respectively, to 
à > 0, x = 0 and to à < 0, y = 0). In these cases the density in (6.23) should 
be interpreted as a limit as x — 0 or as y — O. (Information on the limits of 
Bessel functions is found in Section A.2.5.) The gamma mixing distribution yields 
Laplace distributions or so-called symmetric variance-gamma (VG) models, and the 
inverse gamma yields the ¢ as in Example 6.7; to be precise, the t corresponds to 
the case when A = —v/2 and x = v. The special cases à = —0.5 and A = 1 have 
also attracted attention in financial modelling. The former gives rise to the symmet- 
ric normal inverse Gaussian (NIG) distribution; the latter gives rise to a symmet- 
ric multivariate distribution whose one-dimensional margins are known simply as 
hyperbolic distributions. 

To calculate the covariance matrix of distributions in the symmetric generalized 
hyperbolic family, we require the mean of the GIG distribution, which is given 
in (A.15) for the case x > 0 and y > 0. The covariance matrix of the multivariate 
distribution in (6.23) follows from (6.19). 


fx) = 


Normal variance mixture distributions are easy to work with under linear opera- 
tions, as shown in the following simple proposition. 


Proposition 6.9. If X ~ Ma(u, X, H) and Y = BX + b, where B € R‘*“ and 
b € RÝ, then Y ~ M,(Bu +b, BIB’, H). 
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Proof. The characteristic function in (6.20) may be used to show that 


py (t) = E(e BXH) = e'f b gy (Bt) = elf Fur) Â (Lt BY B't). 


The subclass of mixture distributions specified by H is therefore closed under 
linear transformations. For example, if X has a multivariate t distribution with v 
degrees of freedom, then so does any linear transformation of X; the linear combi- 
nation a’X would have a univariate ¢ distribution with v degrees of freedom (more 
precisely, the distribution a’ X ~ tı (v, a’, a’ Za)). 

Normal variance mixture distributions (and the mean—variance mixtures consid- 
ered later in Section 6.2.2) are easily simulated, the method being obvious from 
Definition 6.4. To generate a variate X ~ Mg(p, X, H ) with X positive definite, 
we use the following algorithm. 


Algorithm 6.10 (simulation of normal variance mixtures). 
(1) Generate Z ~ N (0, X) using Algorithm 6.2. 


(2) Generate independently a positive mixing variable W with df H (correspond- 
ing to the Laplace-Stieltjes transform H). 


(3) SetX=p+J/WZ. 


To generate X ~ tg(v, p, X), the mixing variable W should have an Ig(5v, Jv) dis- 
tribution; it is helpful to note that in this case v/ W ~ x, a chi-squared distribution 
with v degrees of freedom. Sampling from a generalized hyperbolic distribution with 
density (6.23) requires us to generate W ~ N (A, x, Y). Sampling from the GIG 
distribution can be accomplished using a rejection algorithm proposed by Atkinson 
(1982). 


6.2.2 Normal Mean—Variance Mixtures 


All of the multivariate distributions we have considered so far have elliptical symme- 
try (see Section 6.3.2 for explanation) and this may well be an oversimplified model 
for real risk-factor return data. Among other things, elliptical symmetry implies that 
all one-dimensional marginal distributions are rigidly symmetric, which contradicts 
the frequent observation for stock returns that negative returns (losses) have heav- 
ier tails than positive returns (gains). The models we now introduce attempt to add 
some asymmetry to the class of normal mixtures by mixing normal distributions with 
different means as well as different variances; this yields the class of multivariate 
normal mean-variance mixtures. 


Definition 6.11. The random vector X is said to have a (multivariate) normal mean- 
variance mixture distribution if 


xi mw) + WAZ, (6.24) 
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where 
(i) Z~ Nx, Ik), 
(ii) W > 0 is a non-negative, scalar-valued rv which is independent of Z, 
(iii) A €e R?*® is a matrix, and 
(iv) m: [0, 00) —> R? is a measurable function. 


In this case we have that 
X | W = w~ Na(m(w), wr), (6.25) 


where X = AA’ and it is clear why such distributions are known as mean-variance 
mixtures of normals. In general, such distributions are not elliptical. 
A possible concrete specification for the function m(W) in (6.25) is 


m(W)=eh+Wy, (6.26) 


where m and y are parameter vectors in R¢. Since E(X | W) = w+ Wy and 
cov(X | W) = W X, it follows in this case by simple calculations that 


E(X) = E(E(X | W)) = u+ E(W)y, (6.27) 
cov(X) = E(cov(X | W)) + cov(E(X | W)) 
= E(W)Z + var(W)yy’ (6.28) 


when the mixing variable W has finite variance. We observe from (6.27) and (6.28) 
that the parameters yo and X are not, in general, the mean vector and covariance 
matrix of X (or a multiple thereof). This is only the case when y = 9, so that the 
distribution is a normal variance mixture and the simpler moment formulas given 
in (6.19) apply. 


6.2.3 Generalized Hyperbolic Distributions 


In Example 6.8 we looked at the special subclass of the generalized hyperbolic (GH) 
distributions consisting of the elliptically symmetric normal variance mixture distri- 
butions. The full GH family is obtained using the mean-variance mixture construc- 
tion (6.24) and the conditional mean specification (6.26). For the mixing distribution 
we assume that W ~ N (A, x, Y), a GIG distribution with density (A.14). 


Remark 6.12. This class of distributions has received a lot of attention in the 
financial-modelling literature, particularly in the univariate case. An important rea- 
son for this attention is their link to Lévy processes, i.e. processes with independent 
and stationary increments (like Brownian motion or the compound Poisson distri- 
bution) that are used to model price processes in continuous time. For every GH 
distribution it is possible to construct a Lévy process so that the value of the increment 
of the process over a fixed time interval has that distribution; this is only possible 
because the GH law is a so-called infinitely divisible distribution, a property that it 
inherits from the GIG mixing distribution of W. 
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The joint density in the non-singular case (X has rank d) is 


© g(t wy Ely 
fœ) a) (Qn)42| 5] wA 


(x— p/p) yEy 
2w 2/w 


x exp| hw dw, 


where h(w) is the density of W. Evaluation of this integral gives the GH density 


Kia (V F & — W/E Te Y H y Eye EY 
(SOF E= ETE = YU + y Epa 


f(x) =c 
(6.29) 
where the normalizing constant is 
—_ WIDYA + y Eo py 
= 2r) EK ./ XW) ; 


Clearly, if y = 0, the distribution reduces to the symmetric GH special case of 
Example 6.8. In general, we have a non-elliptical distribution with asymmetric mar- 


gins. The mean vector and covariance matrix of the distribution are easily calculated 
from (6.27) and (6.28) using the information on the GIG and its moments given in 
Section A.2.5. The characteristic function of the GH distribution may be calculated 
using the same approach as in (6.20) to yield 


ox(t) = Ee’ 5) =e" Â dr St — it'y), (6.30) 


where H is the Laplace-Stieltjes transform of the GIG distribution. 

We adopt the notation X ~ GHyg(, x, wv, m, »', y). Note that the distribu- 
tions GHg(A, x/k, kw, mw, kX, ky) and GHg(, x, Y, M, X, y) are identical for 
any k > 0, which causes an identifiability problem when we attempt to estimate the 
parameters in practice. This can be solved by constraining the determinant |X | to 
be a particular value (such as one) when fitting. Note that, while such a constraint 
will have an effect on the values of x and w that we estimate, it will not have an 
effect on the value of x y, so this product is a useful summary parameter for the GH 
distribution. 


Linear combinations. The GH class is closed under linear operations. 


Proposition 6.13. If X ~ GH4 (à, x, Y, u4, ©, y) and Y = BX + b, where B € 
R**4 and b € RÝ, then Y ~ GH; (A, x, Y, Bu + b, BEB’, By). 


Proof. We calculate, using (6.30) and a similar method to Proposition 6.9, that 


gy (t) = ef FH) Lt BE B't — it’ By). 


The parameters inherited from the GIG mixing distribution therefore remain un- 
changed under linear operations. This means, for example, that margins of X are 
easy to calculate; we have that X; ~ GHi(, x, W, Mi, Xii, Vi). 
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Parametrizations. There is a bewildering array of alternative parametrizations for 
the GH distribution in the literature, and it is more common to meet this distribution 
in a reparametrized form. In one common version the dispersion matrix we call 
X is renamed A and the constraint that |A| = 1 is imposed; this addresses the 
identifiability problem mentioned above. The skewness parameters y are replaced 
by parameters f, and the non-negative parameters x and yw are replaced by the 
non-negative parameters ô and œ according to 


B=Aty, 8=JX, a=Vwty’Anly. 


These parameters must satisfy the constraints ô > 0, a > BAB if A > 0; 5 > 0, 
a? > B'AB if A = 0; and ô > 0, œ? > BAB if A < 0. Blesild (1981) uses this 
parametrization to show that GH distributions form a closed class of distributions 
under linear operations and conditioning. However, the parametrization does have 
the problem that the important parameters a and ô are not generally invariant under 
either of these operations. 

It is useful to be able to move easily between our x-w—2—y parametrization, 
as in (6.29), and the a—d—A-f parametrization; A and u are common to both 
parametrizations. If the y-y—'—y parametrization is used, then the formulas for 
obtaining the other parametrization are 


A=|s/ 4d,  pawr'y, 


5= x54, a = B/W ty'E-!y). 


If the a-6—A-— form is used, then we can obtain our parametrization by setting 


D=A, y = AB, x = 8, Ww = a> — BAB. 


Special cases. The multivariate GH family is extremely flexible and, as we have 
mentioned, contains many special cases known by alternative names. 


e [fà = 5(d + 1), we drop the word “generalized” and refer to the distribution 
as a d-dimensional hyperbolic distribution. Note that the univariate margins 
of this distribution also have à = 5(d + 1) and are not one-dimensional 


hyperbolic distributions. 


e If A = 1, we get a multivariate distribution whose univariate margins are 
one-dimensional hyperbolic distributions. The one-dimensional hyperbolic 
distribution has been widely used in univariate analyses of financial return 
data (see Notes and Comments). 


elfra=— F, then the distribution is known as an NIG distribution. In the uni- 
variate case this model has also been used in analyses of return data; its 
functional form is similar to the hyperbolic distribution but with a slightly 
heavier tail. (Note that the NIG and the GIG are different distributions!) 


e Ifa > Oand x = 0, we get a limiting case of the distribution known variously 
as a generalized Laplace, Bessel function or VG distribution. 


6.2. Normal Mixture Distributions 191 


e fì = —5y, x = v and y = 0, we get another limiting case that seems 
to have been less well studied; it could be called an asymmetric or skewed 
t distribution. Evaluating the limit of (6.29) as y — 0 yields the multivariate 
density 


Kooy w + O(x))y/Z—!y) exp((x — pw) X'y) 
(Jo + OC) yy’ ETY OF 9/21 + (Q(x) vy) OFO/2’ 


where Q(x) = (x — pp)’ hae ft) and the normalizing constant is 


f(x)=c 


(6.31) 


Q1-w+d)/2 


C= r 
T (Gv) rv)? 51/2 


This density reduces to the standard multivariate t density in (6.22) as y > 0. 


6.2.4 Empirical Examples 


In this section we fit the multivariate GH distribution to real data and examine which 
of the subclasses—such as t, hyperbolic or NIG—are most useful; we also explore 
whether the general mean—variance mixture models can be replaced by (elliptically 
symmetric) variance mixtures. Our first example prepares the ground for multivariate 
examples by looking briefly at univariate models. The univariate distributions are 
fitted by straightforward numerical maximization of the log-likelihood. The multi- 
variate distributions are fitted by using a variant of the EM algorithm, as described 
in Section 15.1.1. 


Example 6.14 (univariate stock returns). In the literature, the NIG, hyperbolic 
and t models have been particularly popular special cases. We fit symmetric and 
asymmetric cases of these distributions to the data used in Example 6.3, restricting 
attention to daily and weekly returns, where the data are more plentiful (n = 2020 
and n = 468, respectively). Models are fitted using maximum likelihood under 
the simplifying assumption that returns form iid samples; a simple quasi-Newton 
method provides a viable alternative to the EM algorithm in the univariate case. 

In the upper two panels of Table 6.2 we show results for symmetric models. The 
t, NIG and hyperbolic models may be compared directly using the log-likelihood 
at the maximum, since all have the same number of parameters: for daily data we 
find that eight out of ten stocks prefer the ¢ distribution to the hyperbolic and NIG 
distributions; for weekly returns the ¢ distribution is favoured in six out of ten cases. 
Overall, the second best model appears to be the NIG distribution. The mixture 
models fit much better than the Gaussian model in all cases, and it may be easily 
verified using the Akaike information criterion (AIC) that they are preferred to the 
Gaussian model in a formal comparison (see Section A.3.6 for more on the AIC). 

For the asymmetric models, we only show cases where at least one of the asym- 
metric t, NIG or hyperbolic models offered a significant improvement (p < 0.05) 
on the corresponding symmetric model according to a likelihood ratio test. This 
occurred for weekly returns on Citigroup (C) and Intel (INTC) but for no daily 
returns. For Citigroup the p-values of the tests were, respectively, 0.06, 0.04 and 


192 6. Multivariate Models 


Table 6.2. Comparison of univariate models in the GH family, showing estimates of selected 
parameters and the value of the log-likelihood at the maximum; bold numbers indicate the 
models that give the largest values of the log-likelihood. See Example 6.14 for commentary. 


Gauss t model NIG model Hyperbolic model 
oo ——— ———_ 


Stock lnL v ln L Vxw lnL Vxw lnL 


Daily returns: symmetric models 
AXP 4945.7 5.8 5001.8 1.6 5002.4 1.3 5002.1 
EK 5112.9 3.8 5396.2 0.8 5382.5 0.6 5366.0 
BA 5054.9 3.8 5233.5 0.8 5229.1 0.5 5221.2 
C 4746.6 6.3 4809.5 1.9 4806.8 1.7 4805.0 
KO 5319.6 5.1 5411.0 1.4 5407.3 1.3 5403.3 
MSFT 4724.3 5.8 4814.6 1.6 4809.5 1.5 4806.4 
HWP 4480.1 45 4588.8 1.1 4587.2 0.9 4583.4 
INTC 4392.3 54 4492.2 1.5 4486.7 1.4 4482.4 
JPM 4898.3 5.1 4967.8 1.3 4969.5 0.9 4969.7 
DIS 5047.2 44 5188.3 1 5183.8 0.8 5177.6 


Weekly returns: symmetric models 


AXP 719.9 8.8 124.2 3.0 724.3 2.8 724.3 
EK 718.7 3.6 765.6 0.7 764.0 0.5 761.3 
BA 7324 44 759.2 1.0 758.3 0.8 757.2 
C 656.0 5.7 669.6 1.6 669.3 1.3 669 

KO 757.1 6.0 765.7 1.7 766.2 1.3 766.3 
MSFT 671.5 6.3 683.9 1.9 683.2 1.8 682.9 
HWP 627.1 6.0 637.3 1.8 637.3 1.5 637.1 
INTC 595.8 5.2 611.0 1.5 610.6 1.3 610 

JPM 681.7 5.9 693.0 1.7 692.9 1.5 692.6 
DIS 734.1 6.4 742.7 1.9 742.8 1.7 742.7 


Weekly returns: asymmetric models 


C NA 6.1 671.4 1.7 671.3 1.3 671.2 
INTC NA 6.3 614.2 1.8 613.9 1.7 613.3 


0.04 for the t, NIG and hyperbolic cases; for Intel the p-values were 0.01 in all 
cases, indicating quite strong asymmetry. 

In the case of Intel we have superimposed the densities of various fitted asymmet- 
ric distributions on a histogram of the data in Figure 6.3. A plot of the log densities 
shown alongside reveals the differences between the distributions in the tail area. 
The left tail (corresponding to losses) appears to be heavier for these data, and the 
best-fitting distribution according to the likelihood comparison is the asymmetric 
t distribution. 


Example 6.15 (multivariate stock returns). We fitted multivariate models to the 
full ten-dimensional data set of log-returns used in the previous example. The result- 
ing values of the maximized log-likelihood are shown in Table 6.3 along with p- 
values for a likelihood ratio test of all special cases against the (asymmetric) GH 
model. The number of parameters in each model is also given; note that the general 
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Figure 6.3. Models for weekly returns on Intel (INTC). 


d-dimensional GH model has 5d (d + 1) dispersion parameters, d location parame- 
ters, d skewness parameters and three parameters coming from the GIG mixing dis- 
tribution, but is subject to one identifiability constraint; this gives 5(d (d+5)+4) 
free parameters. 

For the daily data the best of the special cases is the skewed ż distribution, which 
gives a value for the maximized likelihood that cannot be discernibly improved 
by the more general model with its additional parameter. All other non-elliptically 
symmetric submodels are rejected in a likelihood ratio test. Note, however, that the 
elliptically symmetric ¢ distribution cannot be rejected when compared with the 
most general model, so that this seems to offer a simple parsimonious model for 
these data (the estimated degree of freedom is 6.0). 

For the weekly data the best special case is the NIG distribution, followed closely 
by the skewed ft; the hyperbolic and VG are rejected. The best elliptically symmetric 
special case seems to be the ¢ distribution (the estimated degree of freedom this time 
being 6.2). 


Example 6.16 (multivariate exchange-rate returns). We fitted the same multi- 
variate models to a four-dimensional data set of exchange-rate log-returns, these 
being sterling, the euro, Japanese yen and Swiss franc against the US dollar for 
the period January 2000 to the end of March 2004 (1067 daily returns and 222 
weekly returns). The resulting values of the maximized log-likelihood are shown in 
Table 6.4. 
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Table 6.3. A comparison of models in the GH family for ten-dimensional stock-return data. 
For each model, the table shows the value of the log-likelihood at the maximum (In L), the 
numbers of parameters (“# par.”) and the p-value for a likelihood ratio test against the general 
GH model. The log-likelihood values for the general model, the best special case and the best 
elliptically symmetric special case are in bold type. See Example 6.15 for details. 


GH NIG Hyperbolic t VG Gauss 
Daily returns: asymmetric models 
InL 52174.62 52141.45 52111.65 52174.62 52063.44 
# par. 77 76 76 76 76 
p-value 0.00 0.00 1.00 0.00 
Daily returns: symmetric models 
lnL 52170.14 52136.55 52106.34 52170.14 52057.38 50805.28 
# par. 67 66 66 66 66 65 
p-value 0.54 0.00 0.00 0.63 0.00 0.00 
Weekly returns: asymmetric models 
lnL 7 639.32 7 638.59 7 636.49 7638.56 7631.33 
p-value 0.23 0.02 0.22 0.00 
Weekly returns: symmetric models 
lnL 7633.65 7 632.68 7 630.44 7633.11 7625.4 7433.77 
p-value 0.33 0.27 0.09 0.33 0.00 0.00 


Table 6.4. A comparison of models in the GH family for four-dimensional exchange-rate 
return data. For each model, the table shows the value of the log-likelihood at the maximum 
(ln L), the numbers of parameters (“# par.”) and the p-value for a likelihood ratio test against 
the general GH model. The log-likelihood values for the general model, the best special case 
and the best elliptically symmetric special case are in bold type. See Example 6.16 for details. 


GH NIG Hyperbolic t VG Gauss 
Daily returns: asymmetric models 
lnL 17306.44 17306.43 17305.61 17304.97 172302.5 
# par. 20 19 19 19 19 
p-value 0.85 0.20 0.09 0.00 
Daily returns: symmetric models 
lnL 17303.10 17303.06 17302.15 17301.85 17299.15 17144.38 
# par. 16 15 15 15 15 14 
p-value 0.15 0.24 0.13 0.10 0.01 0.00 
Weekly returns: asymmetric models 
lnL 2 890.65 2 889.90 2 889.65 2 890.65 2 888.98 
p-value 0.22 0.16 1.00 0.07 


Weekly returns: symmetric models 


lnL 2 887.52 2 886.74 2 886.48 2 887.52 2 885.86 2 872.36 
p-value 0.18 0.17 0.14 0.28 0.09 0.00 
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For the daily data the best of the special cases (both in general and if we restrict 
ourselves to symmetric models) is the NIG distribution, followed by the hyperbolic, 
t and VG distributions in that order. In a likelihood ratio test of the special cases 
against the general GH distribution, only the VG model is rejected at the 5% level; 
the skewed tf model is rejected at the 10% level. When tested against the full model, 
certain elliptical models could not be rejected, the best of these being the NIG. 

For the weekly data the best special case is the r distribution, followed by the NIG, 
hyperbolic and VG; none of the special cases can be rejected in a test at the 5% level, 
although the VG model is rejected at the 10% level. Among the elliptically symmetric 
distributions the Gauss distribution is clearly rejected, and the VG is again rejected 
at the 10% level, but otherwise the elliptical special cases are accepted; the best 
of these seems to be the ¢ distribution, which has an estimated degrees-of-freedom 
parameter of 5.99. 


Notes and Comments 


Important early papers on multivariate normal mixtures are Kelker (1970) and Cam- 
banis, Huang and Simons (1981). See also Bingham and Kiesel (2002), which con- 
tains an overview of the connections between the normal mixture, elliptical and 
hyperbolic models, and discusses their role in financial modelling. Fang, Kotz and 
Ng (1990) discuss the symmetric normal mixture models as special cases in their 
account of the more general family of spherical and elliptical distributions. 

The GH distributions (univariate and multivariate) were introduced in Barndorff- 
Nielsen (1978) and further explored in Barndorff-Nielsen and Blesild (1981). Use- 
ful references on the multivariate distribution are Blesild (1981) and Blesild and 
Jensen (1981). Generalized hyperbolic distributions (particularly in the univariate 
case) have been popularized as models for financial returns in recent papers by Eber- 
lein and Keller (1995) and Eberlein, Keller and Prause (1998) (see also Bibby and 
Sørensen 2003). The PhD thesis of Prause (1999) is also a compendium of useful 
information in this context. 

The reasons for their popularity in financial applications are both empirical and 
theoretical: they appear to provide a good fit to financial return data (again mostly in 
univariate investigations); they are consistent with continuous-time models, where 
logarithmic asset prices follow univariate or multivariate Lévy processes (thus 
generalizing the Black-Scholes model, where logarithmic prices follow Brownian 
motion); see Eberlein and Keller (1995) and Schoutens (2003). 

For the NIG special case see Barndorff-Nielsen (1997), who discusses both uni- 
variate and multivariate cases and argues that the NIG is slightly superior to the 
hyperbolic as a univariate model for return data, a claim that our analyses support 
for stock-return data. Kotz, Kozubowski and Podgórski (2001) is a useful refer- 
ence for the VG special case; the distribution appears here under the name general- 
ized Laplace distribution and a (univariate or multivariate) Lévy process with VG- 
distributed increments is called a Laplace motion. The univariate Laplace motion 
is essentially the model proposed by Madan and Seneta (1990), who derived it as a 
Brownian motion under a stochastic time change and referred to it as the VG model 
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(see also Madan, Carr and Chang 1998). The multivariate ¢ distribution is discussed 
in Kotz and Nadarajah (2004); the asymmetric or skewed ¢ distribution presented in 
this chapter is also discussed in Bibby and Sørensen (2003). For alternative skewed 
extensions of the multivariate t, see Kotz and Nadarajah (2004) and Genton (2004). 


6.3 Spherical and Elliptical Distributions 


In the previous section we observed that normal variance mixture distributions— 
particularly the multivariate £ and symmetric multivariate NIG—provided models 
that were far superior to the multivariate normal for daily and weekly US stock-return 
data. The more general asymmetric mean-variance mixture distributions did not 
seem to offer much of an improvement on the symmetric variance mixture models. 
While this was a single example, other investigations suggest that multivariate return 
data for groups of returns of a similar type often show similar behaviour. 

The normal variance mixture distributions are so-called elliptical distributions, 
and in this section we look more closely at the theory of elliptical distributions. To 
do this we begin with the special case of spherical distributions. 


6.3.1 Spherical Distributions 


The spherical family constitutes a large class of distributions for random vectors 
with uncorrelated components and identical, symmetric marginal distributions. It is 
important to note that within this class, Ng (0, Z4) is the only model for a vector of 
mutually independent components. Many of the properties of elliptical distributions 
can best be understood by beginning with spherical distributions. 


Definition 6.17. A random vector X = (X1, ..., Xay has a spherical distribution 
if, for every orthogonal map U € R@*¢ (i.e. maps satisfying UU’ = U'U = Ia), 


ux x. 


Thus spherical random vectors are distributionally invariant under rotations. There 
are a number of different ways of defining distributions with this property, as we 
demonstrate below. 


Theorem 6.18. The following are equivalent. 
(1) X is spherical. 


(2) There exists a function y of a scalar variable such that, for allt € RI, 


x(t) = Ele 5) = Yt = yH +1). (6.32) 


(3) For everya € RI, 
a'X È jjalXı, (6.33) 


where |ja||? = a'a =a? + es + a7. 
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Proof. (1) = (2). If X is spherical, then for any orthogonal matrix U we have 
ox(t) = dux(t) = Ele" "*) = bx(U'D). 

This can only be true if ¢x (t) only depends on the length of t, i.e. if dx (t) = Y (t't) 
for some function y of a non-negative scalar variable. 
(2) => (3). First observe that øx, (t) = E(e*!) = ġx(te1) = Y (t?), where e1 
denotes the first unit vector in R@. It follows that for any a € RZ, 

pax (®©) = px (ta) = Y (Pa'a) = Y Pal?) = bx, Cllall) = Paix: O. 
(3) = (1). For any orthogonal matrix U we have 

dux(t) A a a — Feil) e — oy (s), 


Part (2) of Theorem 6.18 shows that the characteristic function of a spherically 
distributed random vector is fully described by a function y of a scalar variable. For 
this reason w is known as the characteristic generator of the spherical distribution 
and the notation X ~ Sz(y) is used. Part (3) of Theorem 6.18 shows that linear 
combinations of spherical random vectors always have a distribution of the same 
type, so that they have the same distribution up to changes of location and scale 
(see Section A.1.1). This important property will be used in Chapter 8 to prove 
the subadditivity of value-at-risk for linear portfolios of elliptically distributed risk 
factors. We now give examples of spherical distributions. 


Example 6.19 (multivariate normal). A random vector X with the standard uncor- 
related normal distribution Ng (0, I4) is clearly spherical. The characteristic function 
is 

ox(t) = Ee *) =e, 
so that, using part (2) of Theorem 6.18, X ~ Sa(y) with characteristic generator 
w(t) =e 7/2, 


Example 6.20 (normal variance mixtures). A random vector X with a standard- 
ized, uncorrelated normal variance mixture distribution M,(0, I4, H ) also has a 
spherical distribution. Using (6.20), we see that dx (t) = H Gr t), which obvi- 
ously satisfies (6.32), and the characteristic generator of the spherical distribution is 
related to the Laplace-Stieltjes transform of the mixture distribution function of W 
by w(t) = A(4t). Thus X ~ M400, Ia, H(-)) and X ~ Sy(H(4-)) are two ways 
of writing the same mixture distribution. 


A further, extremely important, way of characterizing spherical distributions is 
given by the following result. 
Theorem 6.21. X has a spherical distribution if and only if it has the stochastic 
representation 
Xx ERS, (6.34) 
where S is uniformly distributed on the unit sphere 84—! = {s € Rf: s's = 1} and 
R > 0 is a radial rv, independent of S. 
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Proof. First we prove that if S is uniformly distributed on the unit sphere and R > 0 
is an independent scalar variable, then RS has a spherical distribution. This is seen 
by considering the characteristic function 


brs(t) = E(ef"S) = E(ECeRS | R)). 


Since S is itself spherically distributed, its characteristic function has a characteristic 
generator, which is usually given the special notation 24. Thus, by Theorem 6.18 (2) 
we have that 


brs(t) = E(Qq(R*t't)) = f Rart OAF (r), (6.35) 


where F is the df of R. Since this is a function of t't, it follows, again from Theo- 
rem 6.18 (2), that RS has a spherical distribution. 

We now prove that if the random vector X is spherical, then it has the represen- 
tation (6.34). For any arbitrary s € 4171, the characteristic generator Y of X must 
satisfy Y (t't) = dx (t) = bx (lit |s). It follows that, if we introduce a random vector 
S that is uniformly distributed on the sphere 4—!, we can write 


ytt) = f bx (litis) dFs(s) = / Elltls’X) dFs(s). 
gd-l gd-1 


Interchanging the order of integration and using the 2g notation for the characteristic 
generator of S we have 


YEH = E(Qa(lt |? |X?) = / Qalt'tr?) dx), (6.36) 


where Fix is the df of || X||. By comparison with (6.35) we see that (6.36) is the 
characteristic function of RS, where R is an rv with df Fix, that is independent 
of S. 


We often exclude from consideration distributions that place point mass at the 
origin; that is, we consider spherical rvs X in the subclass Se (WY) for which P(X = 
0) = 0.A particularly useful corollary of Theorem 6.21 is then the following result, 
which is used in Section 15.1.2 to devise tests for spherical and elliptical symmetry. 


Corollary 6.22. Suppose X £ RS ~ S} (y). Then 


(ux zm) Í (R, S) (6.37) 
(ie ae l 


Proof. Let fi (x) = ||x|| and f2(x) = x/||x||. It follows from (6.34) that 


X 
(xı, aa) = (fi(X), P(X)) È (fi(RS), fo(RS)) = (R, S). 


6.3. Spherical and Elliptical Distributions 199 


Example 6.23 (working with R and S). Suppose X ~ Na(0, I4). Since X'X ~ 
rer a chi-squared distribution with d degrees of freedom, it follows from (6.37) that 
R? ~ Xi. 

We can use this fact to calculate E(S) and cov(S), the first two moments of a 
uniform distribution on the unit sphere. We have that 


0 = E(X) = E(R)E(S) > E(S) = 0, 
Ia = cov(X) = E(R7) cov(S) > cov(S) = Ia/d, (6.38) 


since E(R*) = d when R? ~ x2. 

Now suppose that X has a spherical normal variance mixture distribution X ~ 
Ma(0, Ig, H ) and we wish to calculate the distribution of R? Å X'’X in this case. 
a X= : WY, where ie ~ Na(0, I4) and W is independent of Y, it follows that 

£ wR, where R? ~ x 4 2 and W and R are independent. If we can calculate the 
ae of the product of W and an independent chi-squared variate, then we 
have the distribution of R?. 

For a concrete example suppose that X ~ ty(v, 0, I4). For a multivariate t dis- 
tribution we know from Example 6.7 that W ~ Ig(5¥, $v), which means that 
v/W ~ x2. Using the fact that the ratio of independent chi-squared rvs divided by 
their degrees of freedom is F-distributed, it may be calculated that R? /d ~ F (d, v), 
the F distribution on d with v degrees of freedom (see Section A.2.3). Since an 
F (d, v) distribution has mean v/(v — 2), it follows from (6.38) that 


cov(X) = E(cov(RS | R)) = E(R*1Iq/d) = (v/(v — 2)) Ig. 


The normal mixtures with y = 0 and X = J, represent an easily understood 
subgroup of the spherical distributions. There are other spherical distributions that 
cannot be represented as normal variance mixtures; an example is the distribution 
of the uniform vector S on $¢~! itself. However, the normal mixtures have a special 
role in the spherical world, as summarized by the following theorem. 


Theorem 6.24. Denote by Wa the set of characteristic generators that generate a 
d-dimensional spherical distribution for arbitrary d > 1. Then X ~ Sa(y) with 
w € Wy ifand only if X £ ~ WZ, where Z ~ Nq(0, Ig) is independent of W > 0. 


Proof. This is proved in Fang, Kotz and Ng (1990, pp. 48-51). 


Thus, the characteristic generators of normal mixtures generate spherical distri- 
butions in arbitrary dimensions, while other spherical generators may only be used 
in certain dimensions. A concrete example is given by the uniform distribution on 
the unit sphere. Let 24 denote the characteristic generator of the uniform vector 

= (Sj,..., Sa) on 4-1. It can be shown that Ra ((t1,..., ta+1)' (th, .. +5 td+1)) 
is not the characteristic function of a spherical distribution in R+! (for more details 
see Fang, Kotz and Ng (1990, pp. 70-72)). 

If a spherical distribution has a density f, then, by using the inversion formula 


1 0O lee) np 
t®»= z -f e™ * px (t) dti --- dta, 
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it is easily inferred from Theorem 6.18 that f(x) = f(Ux) for any orthogonal 
matrix U, so that the density must be of the form 


f(x) = gax) = g(x +--+ +.x3) (6.39) 


for some function g of a scalar variable, which is referred to as the density generator. 
Clearly, the joint density is constant on hyperspheres {x : x? pF x3 = c} in Rf. 
To give a single example, the density generator of the multivariate t (i.e. the model 
X ~ ta(v, 0, Ig) of Example 6.7) is 


E a D 
89) = TN ; 


6.3.2 Elliptical Distributions 


Definition 6.25. X has an elliptical distribution if 
d 
X=p+AY, 


where Y ~ S;(y) and A € R?** and u € Rf are a matrix and vector of constants, 
respectively. 


In other words, elliptical distributions are obtained by multivariate affine trans- 
formations of spherical distributions. Since the characteristic function is 


bx (t) = E(eit’X) = E (et #+4Y)) = eH E (6 D'Y) = eH y(t! Dt), 
where X = AA’, we denote the elliptical distributions by 


X ~ Ealt, X, Y) 


and refer to u as the location vector, X as the dispersion matrix and y as the 
characteristic generator of the distribution. 


Remark 6.26. Knowledge of X does not uniquely determine its elliptical rep- 
resentation Eq(p, X, Y). Although mw is uniquely determined, X and w are only 
determined up to a positive constant. For example, the multivariate normal dis- 
tribution Ng(m, X) can be written as Eqg(m, X, Y()) or Eal, c£, Y(-/c)) for 
w(u) = e~“/? and any c > 0. Provided that variances are finite, then an elliptical 
distribution is fully specified by its mean vector, covariance matrix and character- 
istic generator, and it is possible to find an elliptical representation Ea (u, X, Y) 
such that X is the covariance matrix of X, although this is not always the standard 
representation of the distribution. 


We now give an alternative stochastic representation for the elliptical distributions 
that follows directly from Definition 6.25 and Theorem 6.21. 


Proposition 6.27. X ~ Eq(u, X, Y) ifand only if there exist S, R and A satisfying 


X 2 u+ RAS, (6.40) 
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with 
(i) S uniformly distributed on the unit sphere Sk! = {s e RE: s's = 1}, 
(ii) R > 0, a radial rv, independent of S, and 
(iii) A e R¢** with AA’ = D. 


For practical examples we are most interested in the case where X is positive 
definite. The relation between the elliptical and spherical cases is then clearly 


X ~ Ealh, E, Y) 4 XTX — u) ~ Sal). (6.41) 


In this case, if the spherical vector Y has density generator g, then X = w+ X? Y 
has density 
Fla) = pape -Y EE- w), 

The joint density is always constant on sets of the form {x : (x— p)’ yo! (x—p) = c}, 
which are ellipsoids in R. Clearly, the full family of multivariate normal variance 
mixtures with general location and dispersion parameters u and X are elliptical, 
since they are obtained by affine transformations of the spherical special cases 
considered in the previous section. 

It follows from (6.37) and (6.41) that for a non-singular elliptical variate X ~ 
Ea(m, X, Y) with no point mass at u, we have 


-12X - w) 
V(X = py EX = p) 
where S is uniformly distributed on 87~! and R is an independent scalar rv. This 
forms the basis of a test of elliptical symmetry described in Section 15.1.2. 

The following proposition shows that a particular conditional distribution of an 


elliptically distributed random vector X has the same correlation matrix as X and 
can also be used to test for elliptical symmetry. 


(va — py D(X — u), ) Š (R, S), (6.42) 


Proposition 6.28. Let X ~ Ea( u, X, Y) and assume that X is positive definite 
and cov(X) is finite. For any c > 0 such that P((X — py ETX —u)2c)>0, 
we have 

p(X | (X = WY D7'(X — w) > c) = p(X). (6.43) 


Proof. It follows easily from (6.42) that 


X| (X-ET X -u) >c p+REPS]|R >c, 


where R = al (X — p)’D—!(X — m) and S is independent of R and uniformly dis- 
tributed on 8¢—!. Thus we have 


X | (X — W XT! (X — n) > c È p+ RSPS, 


where R £ R | R? > c. It follows from Proposition 6.27 that the conditional distri- 
bution remains elliptical with dispersion matrix X and that (6.43) holds. 
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6.3.3 Properties of Elliptical Distributions 


We now summarize some of the properties of elliptical distributions in a format 
that allows their comparison with the properties of multivariate normal distributions 
in Section 6.1.3. Many properties carry over directly and others only need to be 
modified slightly. These parallels emphasize that it would be fairly easy to base 
many standard procedures in risk management on an assumption that risk-factor 
changes have an approximately elliptical distribution, rather than the patently false 
assumption that they are multivariate normal. 


Linear combinations. If we take linear combinations of elliptical random vectors, 
then these remain elliptical with the same characteristic generator y. Let X ~ 
Eq(w, X, Y) and take any B € R**4 and b € RÝ. Using a similar argument to that 
in Proposition 6.9 it is then easily shown that 


BX+b~ Ey (But+b, BIB’, y). (6.44) 
As a special case, if a € RI, then 
a'X ~ Ej(a'p,a’ Xa, Y). (6.45) 


Marginal distributions. It follows from (6.45) that marginal distributions of X 
must be elliptical distributions with the same characteristic generator. Using the X = 
(X1, X} notation from Section 6.1.3 and again extending this notation naturally 


to wand X, 
X X 
p=). z=( 11 a 
W2 X21 222 
we have that Xı ~ Ex(mi, X11, Y) and X2 ~ Eq_x (m2, X22, Y). 


Conditional distributions. The conditional distribution of X2 given X; may also 
be shown to be elliptical, although in general it will have a different characteristic 
generator W. For details of how the generator changes see Fang, Kotz and Ng (1990, 
pp. 45, 46). In the special case of multivariate normality the generator remains the 
same. 


Quadratic forms. If X ~ Eq(m, X, Y) with X non-singular, then we observed 
in (6.42) that 

Q:=(X— py EX m) È R?, (6.46) 
where R is the radial rv in the stochastic representation (6.40). As we have seen 
in Example 6.23, for some particular cases the distribution of R? is well known: if 
X ~ Na(m, X), then R? ~ x7; if X ~ ta(v, m, X), then R?/d ~ F(d, v). For all 
elliptical distributions, Q must be independent of X7! (X — w)/ VQ. 


Convolutions. The convolution of two independent elliptical vectors with the same 
dispersion matrix X is also elliptical. If X and Y are independent d-dimensional 
random vectors satisfying X ~ Ea( n, X, Y) and Y ~ Eq(p, X, Y), then we may 
take the product of characteristic functions to show that 


X +Y ~ E(u +ñ, £, Y), (6.47) 
where Y (u) = y (u)Ẹ (u). 
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If the dispersion matrices of X and Y differ by more than a constant factor, then 
the convolution will not necessarily remain elliptical, even when the two generators 
w and wy are identical. 


6.3.4 Estimating Dispersion and Correlation 


Suppose we have risk-factor return data X;,..., Xn that we believe come from some 
elliptical distribution Ea (m, X, Y) with heavier tails than the multivariate normal. 
We recall from Remark 6.26 that the dispersion matrix X is not uniquely determined, 
but rather is only fixed up to a constant of proportionality; when covariances are 
finite, the covariance matrix is proportional to X. 

In this section we briefly consider the problem of estimating the location param- 
eter w, a dispersion matrix X and the correlation matrix P, assuming finiteness 
of second moments. We could use the standard estimators of Section 6.1.2. Under 
an assumption of iid or uncorrelated vector observations we observed that X and 
S in (6.9) are unbiased estimators of the mean vector and the covariance matrix, 
respectively. They will also be consistent under quite weak assumptions. However, 
this does not necessarily mean they are the best estimators of location and dispersion 
for any given finite sample of elliptical data. There are many alternative estimators 
that may be more efficient for heavy-tailed data and may enjoy better robustness 
properties for contaminated data. 

One strategy would be to fit a number of normal variance mixture models, such as 
the t and NIG, using the approach of Section 6.2.4. From the best-fitting model we 
would obtain an estimate of the mean vector and could easily calculate the implied 
estimates of the covariance and correlation matrices. In this section we give simpler, 
alternative methods that do not require a full fitting of a multivariate distribution; 
consult Notes and Comments for further references to robust dispersion estimation. 


M-estimators. Maronna’s M-estimators (Maronna 1976) of location and disper- 
sion are a relatively old idea in robust statistics, but they have the virtue of being 
particularly simple to implement. Let jf and È denote estimates of the mean 
vector and the dispersion matrix. Suppose for every observation X; we calculate 
D? = (X; — fp) Ê-! (X; — pi). If we wanted to calculate improved estimates of 
location and dispersion, particularly for heavy-tailed data, it might be expected 
that this could be achieved by reducing the influence of observations for which D; 
is large, since these are the observations that might tend to distort the parameter 
estimates most. M-estimation uses decreasing weight functions wj: Rt > Rt, 
j = 1,2, to reduce the weight of observations with large D; values. This can be 
turned into an iterative procedure that converges to so-called M-estimates of loca- 
tion and dispersion; the dispersion matrix estimate is generally a biased estimate of 
the true covariance matrix. 


Algorithm 6.29 (M-estimators of location and dispersion). 


(1) As starting estimates take pl =X and $ = S, the standard estimators 
in (6.9). Set iteration count k = 1. 


(2) Fori =1,...,n set D? = (X; — ÀY SHI! cx; — aly), 
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(3) Update the location estimate using 


ple = De wı(Di)X; 
X; wD) ’ 
where w is a weight function, as discussed below. 


(4) Update the dispersion matrix estimate using 


ss Š g 
SMN = Y wD (Xi — AX; — AY’, 
n 


i=1 
where w2 is a weight function. 


(5) Setk = k + 1 and repeat steps (2)—-(4) until estimates converge. 


Popular choices for the weight functions w; and w2 are the decreasing functions 
w(x) = (d + v)/(x* + v) = w2(x) for some positive constant v. Interestingly, 
use of these weight functions in Algorithm 6.29 exactly corresponds to fitting a 
multivariate tg(v, p, X) distribution with known degrees of freedom v using the 
EM algorithm (see, for example, Meng and van Dyk 1997). 

There are many other possibilities for the weight functions. For example, the 
observations in the central part of the distribution could be given full weight and 
only the more outlying observations downweighted. This can be achieved by set- 
ting w(x) = | for x < a, wi(x) = a/x for x > a, for some value a, and 
w(x?) = (wi(x))?. 


Correlation estimates via Kendall’s tau. A method for estimating correlation that 
is particularly easy to carry out is based on Kendall’s rank correlation coefficient; 
this method will turn out to be related to a method in Chapter 7 that is used for 
estimating the parameters of certain copulas. The theoretical version of Kendall’s 
rank correlation (also known as Kendall’s tau) for two rvs X; and X2 is denoted by 
Pr(X1, X2) and is defined formally in Section 7.2.3; it is shown in Proposition 7.43 
that if (X1, X2) ~ Eo(p, X, Y), then 


De ode 
pr{X1, X2) = — aresin(p), (6.48) 


where p = 012/ (o11022)!/ 2 is the pseudo-correlation coefficient of the elliptical 
distribution, which is always defined (even when correlation coefficients are unde- 
fined because variances are infinite). This relationship can be inverted to provide a 
method for estimating p from data; we simply replace the left-hand side of (6.48) 
by the standard textbook estimator of Kendall’s tau, which is given in (7.52), to 
get an estimating equation that is solved for 6. This method estimates correlation 
by exploiting the geometry of an elliptical distribution and does not require us to 
estimate variances and covariances. 

The method can be used to estimate a correlation matrix of a higher-dimensional 
elliptical distribution by applying the technique to each bivariate margin. This does, 
however, result in a matrix of pairwise correlation estimates that is not necessar- 
ily positive definite; this problem does not always arise, and if it does, a matrix 
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Figure 6.4. For 3000 independent samples of size 90 from a bivariate ¢ distribution with 
three degrees of freedom and linear correlation 0.5: (a) the standard (Pearson) estimator of 
correlation; (b) the Kendall’s tau transform estimator. See Example 6.30 for commentary. 


adjustment method can be used, such as the eigenvalue method of Rousseeuw and 
Molenberghs (1993), which is given in Algorithm 7.57. 

Note that to turn an estimate of a bivariate correlation matrix into a robust esti- 
mate of a dispersion matrix we could estimate the ratio of standard deviations 
à = (022 Jon)! 2 e.g. by using a ratio of trimmed sample standard deviations; 
in other words, we leave out an equal number of outliers from each of the univariate 


data sets X1, ..., Xn, fori = 1,2 and calculate the sample standard deviations 
with the remaining observations. This would give us the estimate 
x 1 ip 
$= (; - $) ; (6.49) 
Ap à 


Example 6.30 (efficient correlation estimation for heavy-tailed data). Suppose 
we calculate correlations of asset or risk-factor returns based on 90 days (somewhat 
more than three trading months) of data; it would seem that this ought to be enough 
data to allow us to accurately estimate the “true” underlying correlation under an 
assumption that we have identically distributed data for that period. 

Figure 6.4 displays the results of a simulation experiment where we have generated 
3000 bivariate samples of iid data from ar distribution with three degrees of freedom 
and correlation p = 0.5; this is a heavy-tailed elliptical distribution. The distribution 
of the values of the standard correlation coefficient (also known as the Pearson 
correlation coefficient) is not particularly closely concentrated around the true value 
and produces some very poor estimates for a number of samples. On the other hand, 
the Kendall’s tau transform method produces estimates that are generally much 
closer to the true value, and thus provides a more efficient way of estimating p. 
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Notes and Comments 


A comprehensive reference for spherical and elliptical distributions is Fang, Kotz and 
Ng (1990); we have based our brief presentation of the theory on this account. Other 
references for the theory are Kelker (1970), Cambanis, Huang and Simons (1981) 
and Bingham and Kiesel (2002), the latter in the context of financial modelling. The 
original reference for Theorem 6.21 is Schoenberg (1938). Frahm (2004) suggests 
a generalization of the elliptical class to allow asymmetric models while preserving 
many of the attractive properties of elliptical distributions. For a more historical 
discussion (going back to Archimedes) and some surprising properties of the uniform 
distribution on the unit d-sphere, see Letac (2004). 

There is a vast literature on alternative estimators of dispersion and correlation 
matrices, particularly with regard to better robustness properties. Textbooks with 
relevant sections include Hampel et al. (1986), Marazzi (1993), Wilcox (1997) and 
Huber and Ronchetti (2009); the last of those books is recommended more generally 
for applications of robust statistics in econometrics and finance. 

We have concentrated on M-estimation of dispersion matrices, since this is related 
to the maximum likelihood estimation of alternative elliptical models. M-estimators 
have a relatively long history and are known to have good local robustness properties 
(insensitivity to small data perturbations); they do, however, have relatively low 
breakdown points in high dimensions, so their performance can be poor when data 
are more contaminated. A small selection of papers on M-estimation is Maronna 
(1976), Devlin, Gnanadesikan and Kettenring (1975, 1981) and Tyler (1983, 1987); 
see also Frahm (2004), in which an interesting alternative derivation of a Tyler 
estimator is given. The method based on Kendall’s tau was suggested in Lindskog, 
McNeil and Schmock (2003). 


6.4 Dimension-Reduction Techniques 


The techniques of dimension reduction, such as factor modelling and principal com- 
ponents, are central to multivariate statistical analysis and are widely used in econo- 
metric model building. In the high-dimensional world of financial risk management 
they are essential tools. 


6.4.1 Factor Models 


By using a factor model we attempt to explain the randomness in the components 
of a d-dimensional vector X in terms of a smaller set of common factors. If the 
components of X represent, for example, equity returns, it is clear that a large part 
of their variation can be explained in terms of the variation of a smaller set of market 
index returns. Formally, we define a factor model as follows. 


Definition 6.31 (linear factor model). The random vector X is said to follow a 
p-factor model if it can be decomposed as 


X=a+BF +e, (6.50) 
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where 


Gi) F=(Fi,..., Fp) is a random vector of common factors with p < d and a 
covariance matrix that is positive definite, 


(ii) e = (€1,..., €g)’ is a random vector of idiosyncratic error terms, which are 
uncorrelated and have mean 0, 


(iii) B € R¢*? is a matrix of constant factor loadings and a € Rf is a vector of 
constants, and 


(iv) cov(F, e€) = E((F — E(F))e’) = 0. 


The assumptions that the errors are uncorrelated with each other (ii) and also 
with the common factors (iv) are important parts of this definition. We do not in 
general require independence, only uncorrelatedness. However, if the vector X is 
multivariate normally distributed and follows the factor model in (6.50), then it is 
possible to find a version of the factor model where F and e are Gaussian and the 
errors can be assumed to be mutually independent and independent of the common 
factors. We elaborate on this assertion in Example 6.32 below. 

It follows from the basic assumptions that factor models imply a special structure 
for the covariance matrix X of X. If we denote the covariance matrix of F by 2 
and that of e by the diagonal matrix T, it follows that 


X =cov(X) = BQB’+Y. (6.51) 


If the factor model holds, the common factors can always be transformed so that 
they have mean 0 and are orthogonal. By setting F* = Q7'/?(F — E(F)) and B* = 
BR! wehavea representation of the factor model of the form X = p+B*F* +e, 
where u = E(X), as usual, and X = B*(B*) + Y. 
Conversely, it can be shown that whenever a random vector X has a covariance 
matrix that satisfies 
X =BB'+Y (6.52) 


for some B € R?*P with rank(B) = p < d and diagonal matrix Y, then X has a 
factor-model representation for some p-dimensional factor vector F and d-dimen- 
sional error vector €. 


Example 6.32 (equicorrelation model). Suppose X is a random vector with stan- 
dardized margins (zero mean and unit variance) and an equicorrelation matrix; in 
other words, the correlation between each pair of components is equal to p > 0. 
This means that the covariance matrix X can be written as X = pJg+ (1 — p)la, 
where Jy is the d-dimensional square matrix of ones and Jy is the identity matrix, 
so that X is obviously of the form (6.52) for the d-vector B = ,/p1. 

To find a factor decomposition of X, take any zero-mean, unit-variance rv Y that 
is independent of X and define a single common factor F and errors € by 


Jp d 


= X; + | ae cy A I COT) 28 
reed 4%) ET er NP 
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where we note that in this construction F also has mean 0 and variance 1. We 
therefore have the factor decomposition X = BF + e, and it may be verified by 
calculation that cov(F, €j) = O for all j and cov(e;, €k) = 0 when j 4 k, so 
that the requirements of Definition 6.31 are satisfied. A random vector with an 
equicorrelation matrix can be thought of as following a factor model with a single 
common factor. 

Since we can take any Y, the factors and errors in this decomposition are non- 
unique. Consider the case where the vector X is Gaussian; it is most convenient 
to take Y to also be Gaussian, since in that case the common factor is normally 
distributed, the error vector is multivariate normally distributed, Y is independent 
of £j, for all j, and £; and £x are independent for j 4 k. Since var(e;) = 1 — p, it 
is most convenient to write the factor model implied by the equicorrelation model 
as 


Wepre EET E a OE! (6.53) 


where F, Z1, ..., Zq are mutually independent standard Gaussian rvs. This model 
will be used in Section 11.1.5 in the context of modelling homogeneous credit port- 
folios. For the more general construction on which this example is based, see Mardia, 
Kent and Bibby (1979, Exercise 9.2.2). 


6.4.2 Statistical Estimation Strategies 


Now assume that we have data X1, ..., X, € R? representing risk-factor changes 
at times f = 1,...,n. Each vector observation X; is assumed to be a realization 
from a factor model of the form (6.50) so that we have 


X;=a+BFi+e, t=1,...,7, (6.54) 


for common-factor vectors F, = (Fi1,..., Fr, py , error vectors €+, a vector of 
constants a € R, and loading matrix B € R¢*?. There are occasionally situations 
where we might wish to model a and B as time dependent, but mostly they are 
assumed to be fixed over time. 

The model (6.54) is clearly an idealization. Data will seldom be perfectly 
explained by a factor model; the aim is to find an approximating factor model 
that captures the main sources of variability in the data. Three general types of 
factor model are commonly used in financial risk applications; they are known as 
macroeconomic, fundamental and statistical factor models. 


Macroeconomic factor models. In these models we assume that appropriate factors 
F, are also observable and we collect time-series data F|,..., Fa € RP. The name 
comes from the fact that, in many applications of these models in economics and 
finance, the observed factors are macroeconomic variables, such as changes in GDP, 
inflation and interest rates. 

A simple example of a macroeconomic model in finance is Sharpe’s single-index 
model, where F|,..., Fa are observations of the return on a market index and 
X\,..., Xn are individual equity returns that are explained in terms of the market 
return. Fitting of the model (estimation of B and a) is accomplished by time-series 
regression techniques; it is described in Section 6.4.3. 
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Fundamental factor models. In contrast to the macroeconomic factor models, here 
we assume that the loading matrix B is known but that the underlying factors 
F, are unobserved. Factor values F),..., F, have to be estimated from the data 
X\,..., Xn using cross-sectional regression at each time point. 

The name comes from applications in modelling equity returns where the stocks 
are classified according to their “fundamentals”, such as country, industry sector 
and size (small cap, large cap, etc.). These are generally categorical variables and it 
is assumed that there are underlying, unobserved factors associated with each level 
of the categorical variable, e.g. a factor for each country or each industry sector. 

If each risk-factor change X;,; can be identified with a unique set of values for 
the fundamentals, e.g. a unique country or industry, then the matrix B is a matrix 
consisting of zeros and ones. If X; ; is attributed to different values of the fundamen- 
tal variable, then B might contain factor weights summing to 1; for example, 60% 
of a stock return for a multinational company might be attributed to an unobserved 
US factor and 40% to an unobserved UK factor. There may also be situations in 
fundamental factor modelling where time-dependent loading matrices B, are used. 


Statistical factor models. In these models we observe neither the factors F; nor 
the loadings B. Instead, we use statistical techniques to estimate both from the data 
X\,..., Xn. This can be a very powerful approach to explaining the variability in 
data, but we note that the factors we obtain, while being explanatory in a statistical 
sense, may not have any obvious interpretation. 

There are two general methods for finding factors. The first method, which is quite 
common in finance, is to use principal component analysis to construct factors; we 
discuss this technique in detail in Section 6.4.5. The second method, classical stat- 
istical factor analysis, is less commonly used in finance (see Notes and Comments). 


Factor models and systematic risk. In the context of risk management, the goal 
of all approaches to factor modelling is either to identify or to estimate appropriate 
factor data F\,..., Fn. If this is achieved, we can then concentrate on modelling 
the distribution or dynamics of the factors, which is a lower-dimensional problem 
than modelling X1,..., Xn. 

The factors describe the systematic risk and are of primary importance. The unob- 
served errors £1, ..., €n describe the idiosyncratic risk and are of secondary impor- 
tance. In situations where we have many risk factors, the risk embodied in the errors 
is partly mitigated by a diversification effect, whereas the risk embodied in the com- 
mon factors remains. The following simple example gives an idea why this is the 
case. 


Example 6.33. We continue our analysis of the one-factor model in Example 6.32. 
Suppose that the random vector X in that example represents the return on d different 
companies so that the rv Z(g) = (1/d) aan X j can be thought of as the portfolio 
return for an equal investment in each of the companies. We calculate that 


d 
1 1 1 
Z(d) = qi BF + qle = /pF+ Fi Xej. 
j=1 
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The risk in the first term is not affected by increasing the size of the portfolio d, 
whereas the risk in the second term can be reduced. Suppose we measure risk by 
simply calculating variances; we get 


t= 
var(Z(q)) = p + — >p, d-o, 


showing that the systematic factor is the main contributor to the risk in a large- 
portfolio situation. 


6.4.3 Estimating Macroeconomic Factor Models 


Two equivalent approaches may be used to estimate the model parameters in a 
macroeconomic factor model of the form (6.54). In the first approach we perform 
d univariate regression analyses, one for each component of the individual return 
series. In the second approach we estimate all parameters in a single multivariate 
regression. 


Univariate regression. Writing X;,; for the observation at time ¢ of instrument j, 
we consider the univariate regression model 


Xj = aj + OF, + &1,;, BS dy 


This is known as a time-series regression, since the responses X1,;,..., Xn, j form 
a univariate time series and the factors F,..., F, form a possibly multivariate time 
series. Without going into technical details we simply remark that the parameters a; 
and b; are estimated using the standard ordinary least-squares (OLS) method found 
in all textbooks on linear regression. To justify the use of the method and to derive 
statistical properties of the method it is usually assumed that, conditional on the 
factors, the errors €1,;, ..., En, j are identically distributed and serially uncorrelated. 
In other words, they form a white noise process as defined in Chapter 4. 

The estimate a; obviously estimates the jth component of a, while b j is an 
estimate of the jth row of the matrix B. By performing a regression for each of the 
univariate time series X1,;,..., Xn, j for j = 1,...,d, we complete the estimation 
of the parameters a and B. 


Multivariate regression. To set the problem up as a multivariate linear-regression 
problem, we construct a number of large matrices: 


X: 1 F el 
A : a . 
X= , F= © B=|,). E= 
Xj, 1 F; —j—’ el, 
(p+1)xd —— 
nxd nx(p+l) nxd 


Each row of the data X corresponds to a vector observation at a fixed time point t, 
and each column corresponds to a univariate time series for one of the individual 
returns. The model (6.54) can then be expressed by the matrix equation 


X=FB +E, (6.55) 


where Bz is the matrix of regression parameters to be estimated. 
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If we assume that the unobserved error vectors €1, ..., €, comprising the rows of 
E are identically distributed and serially uncorrelated, conditional on F3, ..., Fn, 
then the equation (6.55) defines a standard multivariate linear regression (see, for 
example, Mardia, Kent and Bibby (1979) for the standard assumptions). An estimate 
of B2 is obtained by multivariate OLS according to the formula 


Bo = (F'F)"'F’X. (6.56) 


The factor model is now essentially calibrated, since we have estimates for a 
and B. The model can now be critically examined with respect to the original con- 
ditions of Definition 6.31. Do the error vectors €; come from a distribution with 
diagonal covariance matrix, and are they uncorrelated with the factors? 

To learn something about the errors we can form the model residual matrix 
E=X-F Bo. Each row of this matrix contains an inferred value of an error vector 
ê, at a fixed point in time. Examination of the sample correlation matrix of these 
inferred error vectors will hopefully show that there is little remaining correlation 
in the errors (or at least much less than in the original data vectors X,+). If this is the 
case, then the diagonal elements of the sample covariance matrix of the ê; could 
be taken as an estimator Y for Y. It is sometimes of interest to form the covari- 
ance matrix implied by the factor model and compare this with the original sample 
covariance matrix S of the data. The implied covariance matrix is 


i ee ae R hts ee A 2 
SP = BQB +T, where È = —— X (F, — F)(F, — FY. 
n—l1 PE 
We would hope that £P captures much of the structure of S and that the correlation 
matrix R® := (£P) captures much of the structure of the sample correlation 
matrix R = (S). 


Example 6.34 (single-index model for Dow Jones 30 returns). As a simple exam- 
ple of the regression approach to fitting factor models we have fitted a single factor 
model to a set of ten Dow Jones 30 daily stock-return series from 1992 to 1998. Note 
that these are different returns to those analysed in previous sections of this chapter. 
They have been chosen to be of two types: technology-related titles such as Hewlett- 
Packard, Intel, Microsoft and IBM; and food- and consumer-related titles such as 
Philip Morris, Coca-Cola, Eastman Kodak, McDonald’s, Wal-Mart and Disney. The 
factor chosen is the corresponding return on the Dow Jones 30 index itself. 

The estimate of B implied by formula (6.56) is shown in the first line of Table 6.5. 
The highest values of B correspond to so-called high-beta stocks; since a one-factor 
model implies the relationship E(X;) = a; + B;E(F), these stocks potentially 
offer high expected returns relative to the market (but are often riskier titles); in 
this case, the four technology-related stocks have the highest beta values. In the 
second row, values of r2, the so-called coefficient of determination, are given for 
each of the univariate regression models. This number measures the strength of the 
regression relationship between X ; and F and can be interpreted as the proportion 
of the variation of the stock return that is explained by variation in the market return; 
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Table 6.5. The first line gives estimates of B for a multivariate regression model fitted to 
ten Dow Jones 30 stocks where the observed common factor is the return on the Dow Jones 
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30 index itself. The second row gives r values for a univariate regression model for each 


individual time series. The next ten lines of the table give the sample correlation matrix of 


the data R, while the middle ten lines give the correlation matrix implied by the factor model. 


The final ten lines show the estimated correlation matrix of the residuals from the regression 
model, with entries less than 0.1 in absolute value being omitted. See Example 6.34 for full 


details. 

MO KO EK HWP INTC MSFT IBM MCD WMT DIS 
B 0.87 1.01 0.77 1.12 1.12 1.11 1.07 0.86 1.02 1.03 
r2 0.17 0.33 0.14 0.18 0.17 0.21 0.22 0.23 0.24 0.26 
MO 1.00 0.27 0.14 0.17 0.16 0.25 0.18 0.22 0.16 0.22 
KO 0.27 1.00 0.17 0.22 0.21 0.25 0.18 0.36 0.33 0.32 
EK 0.14 0.17 1.00 0.17 0.17 0.18 0.15 0.14 0.17 0.16 
HWP 0.17 0.22 0.17 1.00 0.42 0.38 0.36 0.20 0.22 0.23 
INTC 0.16 0.21 0.17 0.42 1.00 0.53 0.36 0.19 0.22 0.21 
MSFT 0.25 0.25 0.18 0.38 0.53 1.00 0.33 0.22 0.28 0.26 
IBM 0.18 0.18 0.15 0.36 0.36 0.33 1.00 0.20 0.20 0.20 
MCD 0.22 0.36 0.14 0.20 0.19 0.22 0.20 1.00 0.26 0.26 
WMT 0.16 0.33 0.17 0.22 0.22 0.28 0.20 0.26 1.00 0.28 
DIS 0.22 0.32 0.16 0.23 0.21 0.26 0.20 0.26 0.28 1.00 
MO 1.00 0.24 0.16 0.18 0.17 0.19 0.20 0.20 0.20 0.21 
KO 0.24 1.00 0.22 0.24 0.23 0.26 0.27 0.28 0.28 0.29 
EK 0.16 0.22 1.00 0.16 0.15 0.17 0.18 0.18 0.18 0.19 
HWP 0.18 0.24 0.16 1.00 0.17 0.19 0.20 0.20 0.21 0.22 
INTC 0.17 0.23 0.15 0.17 1.00 0.19 0.19 0.19 0.20 0.21 
MSFT 0.19 0.26 0.17 0.19 0.19 1.00 0.22 0.22 0.22 0.23 
IBM 0.20 0.27 0.18 0.20 0.19 0.22 1.00 0.23 0.23 0.24 
MCD 0.20 0.28 0.18 0.20 0.19 0.22 0.23 1.00 0.23 0.24 
WMT 0.20 0.28 0.18 0.21 0.20 0.22 0.23 0.23 1.00 0.25 
DIS 0.21 0.29 0.19 0.22 0.21 0.23 0.24 0.24 0.25 1.00 
MO 1.00 
KO 1.00 —0.12 0.12 
EK 1.00 
HWP 1.00 0.30 0.24 0.20 
INTC 0.30 1.00 0.43 0.20 
MSFT 0.24 0.43 1.00 0.14 
IBM —0.12 0.20 0.20 0.14 1.00 
MCD 0.12 1.00 
WMT 
DIS 1.00 


the highest r? corresponds to Coca-Cola (33%), and in general it seems that about 
20% of individual stock-return variation is explained by market-return variation. 

The next ten lines of the table give the sample correlation matrix of the data R, 
while the middle ten lines give the correlation matrix implied by the factor model 
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(corresponding to EP), The latter matrix picks up much, but not all, of the structure 
of the former matrix. The final ten lines show the estimated correlation matrix of 
the residuals from the regression model, but only those elements that exceed 0.1 in 
absolute value. The residuals are indeed much less correlated than the original data, 
but a few larger entries indicate imperfections in the factor-model representation 
of the data, particularly for the technology stocks. The index return for the broader 
market is clearly an important common factor, but further systematic effects that are 
not captured by the index appear to be present in these data. 


6.4.4 Estimating Fundamental Factor Models 


To estimate a fundamental factor model we consider, at each time point ¢, a cross- 
sectional regression model of the form 


X,= BF, + &;, (6.57) 


where X, € Rf are the risk-factor change data, B € R@*? is a known matrix of 
factor loadings (which may be time dependent in some applications), F, € R? are 
the factors to be estimated, and e; are errors with diagonal covariance matrix Y. 
There is no need for an intercept a in the estimation of a fundamental factor model, 
as this can be absorbed into the factor estimates. 

To obtain precision in the estimation of F;, the dimension d of the risk-factor 
vector needs to be large with respect to the number of factors p to be estimated. 
Note also that the components of the error vector e€; cannot generally be assumed to 
have equal variance, so (6.57) is a regression problem with so-called heteroscedastic 
errors. 

We recall that, in typical applications in equity return modelling, the factors are 
frequently identified with country, industry-sector and company-size effects. The 
rows of the matrix B can consist of zeros and ones, if X;,; is associated with a 
single country or industry sector, or weights, if X; ; is attributed to more than one 
country or industry sector. This kind of interpretation for the factors is also quite 
common in the factor models used for modelling portfolio credit risk, as we discuss 
in Section 11.5.1. 

Unbiased estimators of the factors F, may be obtained by forming the OLS 
estimates 

Fools = (B’B)"'B’X,, 


and these are the best linear unbiased estimates in the case where the errors are 
homoscedastic, so that Y = v? Ia for some scalar v. However, in general, the OLS 
estimates are not efficient and it is possible to obtain linear unbiased estimates with 
a smaller covariance matrix using the method of generalized least squares (GLS). 
If Y were a known matrix, the GLS estimates would be given by 


FOSS (Be B) BY OX. (6.58) 


In practice, we replace Y in (6.58) with an estimate Ŷ obtained as follows. Under 
an assumption that the model (6.57) holds at every time point t = 1, ..., n, we first 
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carry out OLS estimation at each time ¢ and form the model residual vectors 


ê, = X, — BRO, t= lnn: 
We then form the sample covariance matrix of the residuals ê4, . . . , ên. This matrix 


should be approximately diagonal, if the factor model assumption holds. We can set 
off-diagonal elements equal to zero to form an estimate of T. 

We give an example of the estimation of a fundamental factor model in the context 
of modelling the yield curve in Section 9.1.4. 


6.4.5 Principal Component Analysis 


The aim of principal component analysis (PCA) is to reduce the dimensionality of 
highly correlated data by finding a small number of uncorrelated linear combinations 
that account for most of the variance of the original data. PCdimensional reductionA 
is not itself a model, but rather a data-rotation technique. However, it can be used 
as a way of constructing factors for use in factor modelling, and this is the main 
application we consider in this section. 

The key mathematical result behind the technique is the spectral decomposition 
theorem of linear algebra, which says that any symmetric matrix A € R¢*¢ can be 


written as 
A= TAT", (6.59) 
where 
(i) A = diag(å1, ... , Aq) is the diagonal matrix of eigenvalues of A that, without 


loss of generality, are ordered so that A, > A2 >--- > Ag, and 


(ii) I is an orthogonal matrix satisfying TIT’ = I'T = Ig whose columns are 
standardized eigenvectors of A (i.e. eigenvectors with length 1). 


Theoretical principal components. Obviously we can apply this decomposition 
to any covariance matrix X, and in this case the positive semidefiniteness of X 
ensures that A; > 0 for all j. Suppose the random vector X has mean vector u and 
covariance matrix X and we make the decomposition X = I AT” as in (6.59). The 
principal components transform of X is then defined to be 


Y = T'(X — p), (6.60) 


and it can be thought of as a rotation and a recentring of X. The jth component of 
the rotated vector Y is known as the jth principal component of X and is given by 


Yj = y; (X — mw), (6.61) 


where y; is the eigenvector of X corresponding to the jth ordered eigenvalue; this 
vector is also known as the jth vector of loadings. 
Simple calculations show that 


E(Y)=0 and cov(¥)= IST = FA ra, 


so that the principal components of Y are uncorrelated and have variances 
var(Y;) = àj, Yj. The components are thus ordered by variance, from largest to 
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smallest. Moreover, the first principal component can be shown to be the standard- 
ized linear combination of X that has maximal variance among all such combina- 
tions; in other words, 


var(y; X) = max{var(a’X): a'a = 1}. 


For j = 2,...,d, the jth principal component can be shown to be the standardized 
linear combination of X with maximal variance among all such linear combinations 
that are orthogonal to (and hence uncorrelated with) the first j — 1 linear combina- 
tions. The final dth principal component has minimum variance among standardized 
linear combinations of X. 

To measure the ability of the first few principal components to explain the variance 
of X, we observe that 


d d d 
5 var(Y;) = > Aj = trace( X) = > var(X j). 
j=l 


j=l j=l 


If we interpret trace( X) = BA var(X j) as a measure of the total variance of X, 
then, for k < d, the ratio ye 1Aj/ a q Àj represents the amount of this variance 


that is explained by the first k principal components. 


Principal components as factors. We note that, by inverting the principal compo- 
nents transform (6.60), we obtain 


X=ptTY=pw4+lY, +h», 


where we have partitioned Y into vectors Yı € RÝ and Y) € R¢-*, such that Yı 
contains the first k principal components, and we have partitioned J” into matrices 
Ti € R?** and P e R4*4- correspondingly. Let us assume that the first k 
principal components explain a large part of the total variance and we decide to 
focus our attention on them and ignore the further principal components in Y2. If 
we set €e = I2 Y2, we obtain 


X=p+Myi +e, (6.62) 


which is reminiscent of the basic factor model (6.50) with the vector Yı playing 
the role of the factors and the matrix I playing the role of the factor loading 
matrix. Although the components of the error vector e will tend to have small 
variances, the assumptions of the factor model are generally violated in (6.62) since 
€ need not have a diagonal covariance matrix and need not be uncorrelated with 
Y,. Nevertheless, principal components are often interpreted as factors and used to 
develop approximate factor models. We now describe the estimation process that is 
followed when data are available. 


Sample principal components. Assume that we have a time series of multivariate 
data observations X1,..., Xn with identical distribution, unknown mean vector m 
and covariance matrix X, with the spectral decomposition X = I AT” as before. 
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To construct sample principal components we need to estimate the unknown 


parameters. We estimate u by X, the sample mean vector, and we estimate X by 
the sample covariance matrix 


i< : 2 
Sx == D(X -DA — XY, 
t=1 
We apply the spectral decomposition (6.59) to the symmetric, positive-semidefinite 
matrix S, to get 


Sx = GLG’, (6.63) 


where G is the eigenvector matrix, L = diag(lı, ..., lq) is the diagonal matrix 
consisting of ordered eigenvalues, and we switch to roman letters to emphasize 
that these are now calculated from an empirical covariance matrix. The matrix G 
provides an estimate of J” and L provides an estimate of A. 

By analogy with (6.60) we define vectors of sample principal components 


Y, = G'(X,— X), t=1,...,n. (6.64) 


The jth component of Y, is known as the jth sample principal component at time t 
and is given by 
Y, j = g (X; z Ž), 


where g; is the jth column of G, that is, the eigenvector of S, corresponding to the 
jth largest eigenvalue. 

The rotated vectors Y|,..., Y, have the property that their sample covariance 
matrix is L, as is easily verified: 


hee 2 Er ee , 
ee YY, eee as 


Š 2 Ż 
=- >» G(X, — X) (X, — X)'G = G'SG = L. 

n t=1 
The rotated vectors therefore show no correlation between components, and the 


components are ordered by their sample variances, from largest to smallest. 


Remark 6.35. In a situation where the different components of the data vectors 
X\,..., Xn have very different sample variances (particularly if they are measured 
on very different scales), it is to be expected that the component (or components) 
with largest variance will dominate the first loading vector gı and dominate the first 
principal component. In these situations the data are often transformed to have iden- 
tical variances, which effectively means that principal component analysis is applied 
to the sample correlation matrix Ry. Note also that we could derive sample princi- 
pal components from a robust estimate of the correlation matrix or a multivariate 
dispersion matrix. 
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We can now use the sample eigenvector matrix G and the sample principal com- 
ponents Y, to calibrate an approximate factor model of the form (6.62). We assume 
that our data are realizations from the model 


X,=X+GiF+e, t=1,...,n, (6.65) 
where G; consists of the first k columns of G and Fy = (Yi1,..-, Yp), t = 
1,..., n. The choice of k is based on a subjective choice of the number of sample 


principal components that are required to explain a substantial part of the total 
sample variance (see Example 6.36). 

Equation (6.65) bears a resemblance to the factor model (6.54) except that, in 
practice, the errors €; do not generally have a diagonal covariance matrix and are 
not generally uncorrelated with F;. Nevertheless, the method is a popular approach 
to constructing time series of statistically explanatory factors from multivariate time 
series of risk-factor changes. 


Example 6.36 (PCA-based factor model for Dow Jones 30 returns). We consider 
the data in Example 6.34 again. Principal component analysis is applied to the sample 
covariance matrix of the return data and the results are summarized in Figures 6.5 
and 6.6. In the former we see a bar plot of the sample variances of the first eight 
principal components /;; the cumulative proportion of the total variance explained 
by the components is given above each bar; the first two components explain almost 
50% of the variation. In the latter figure the first two loading vectors gı and g2 are 
summarized. 

The first vector of loadings is positively weighted for all stocks and can be thought 
of as describing a kind of index portfolio; of course, the weights in the loading vector 
do not sum to 1, but they can be scaled to do so and this gives a so-called principal- 
component-mimicking portfolio. The second vector has positive weights for the 
consumer titles and negative weights for the technology titles; as a portfolio it can 
be thought of as prescribing a programme of short selling of technology to buy 
consumer titles. These first two sample principal components loading vectors are 
used to define factors. 

In Table 6.6 the transpose of the matrix B (containing the loadings estimates in 
the factor model) is shown; the rows are merely the first two loading vectors from the 
principal component analysis. In the third row, values of r°, the so-called coefficient 
of determination, are given for each of the univariate regression models, and these 
indicate that more of the variation in the data is explained by the two PCA-based 
factors than was explained by the observed factor in Example 6.34; it seems that the 
model is best able to explain Intel returns. 

The next ten lines give the correlation matrix implied by the factor model (corres- 
ponding to E®), Compared with the true sample correlation matrix in Example 6.34 
this seems to pick up more of the structure than did the correlation matrix implied 
by the observed factor model. 

The final ten lines show the estimated correlation matrix of the residuals from the 
regression model, but only those elements that exceed 0.1 in absolute value. The 
residuals are again less correlated than the original data, but there are quite a number 
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Figure 6.5. Bar plot of the sample variances l; of the first eight principal components; the 
cumulative proportion of the total variance explained by the components is given above each 
bar ©% / Dj bj k= 1... 8). 
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Figure 6.6. Bar plot summarizing the loading vectors gı and g% defining the 
first two principal components: (a) factor 1 loadings; (b) factor 2 loadings. 
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Table 6.6. The first two lines give estimates of the transpose of B for a factor model fitted 
to ten Dow Jones 30 stocks, where the factors are constructed from the first two sample 
principal components. The third row gives r? values for the univariate regression model for 
each individual time series. The next ten lines give the correlation matrix implied by the 
factor model. The final ten lines show the estimated correlation matrix of the residuals from 
the regression model, with entries less than 0.1 in absolute value omitted. See Example 6.36 
for full details. 


MO KO EK HWP INTC MSFT IBM MCD WMT DIS 


B’ 0.20 0.19 0.16 045 0.51 044 0.32 0.18 0.24 0.22 
0.39 0.34 0.23 —0.26 —0.45 —0.10 —0.07 0.31 0.39 0.37 
r? 0.35 0.42 0.18 0.55 0.75 056 035 0.34 042 0.41 


MO 1.00 0.39 0.25 0.17 0.13 0.25 0.20 0.35 0.38 0.38 
KO 0.39 1.00 0.28 0.21 0.17 0.29 0.23 0.38 042 0.42 
EK 0.25 0.28 1.00 0.18 0.15 0.22 018 0.25 0.28 0.27 
HWP 0.17 0.21 0.18 1.00 0.64 0.55 0.43 0.20 0.23 0.23 
INTC 0.13 0.17 0.15 0.64 1.00 0.61 0.48 0.16 0.19 0.18 
MSFT 0.25 0.29 0.22 0.55 0.61 1.00 044 0.27 0.31 0.30 
IBM 0.20 0.23 0.18 0.43 0.48 044 1.00 0.21 0.25 0.24 
MCD 0.35 0.38 0.25 0.20 0.16 0.27 0.21 1.00 0.38 0.37 
WMT 0.38 0.42 0.28 0.23 0.19 0.31 0.25 0.38 1.00 0.41 
DIS 0.38 0.42 0.27 0.23 0.18 0.30 0.24 0.37 0.41 1.00 


MO 1.00 —0.19 —0.15 —0.19 —0.37 —0.26 
KO —0.19 1.00 —0.15 0.11 —0.16 —0.17 
EK —0.15 —0.15 1.00 —0.15 —0.16 —0.16 
HWP 1.00 —0.63 —0.37 —0.14 
INTC 0.11 —0.63 1.00 —0.24 —0.31 
MSFT —0.37 —0.24 1.00 —0.22 
IBM —0.14 —0.31 —0.22 1.00 
MCD -—0.19 —0.15 1.00 —0.19 —0.19 
WMT -—0.37 —0.16 —0.16 —0.19 1.00 —0.23 
DIS —0.26 —0.17 —0.16 —0.19 —0.23 1.00 


of larger entries, indicating imperfections in the factor-model representation of the 
data. In particular, we have introduced a number of larger negative correlations into 
the residuals; in practice, we seldom expect to find a factor model in which the 
residuals have a covariance matrix that appears perfectly diagonal. 


Notes and Comments 


For a more detailed discussion of factor models see the paper by Connor (1995), 
which provides a comparison of the three types of model, and the book by Campbell, 
Lo and MacKinlay (1997). An excellent practical introduction to these models with 
examples in S-Plus is Zivot and Wang (2003). Other accounts of factor models and 
PCA in finance are found in Alexander (2001) and Tsay (2002). 

Much of our discussion of factor models, multivariate regression and principal 
components is based on Mardia, Kent and Bibby (1979). Statistical approaches to 
factor models are also treated in Seber (1984) and Johnson and Wichern (2002); these 
include classical statistical factor analysis, which we have omitted from our account. 
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Copulas and Dependence 


In this chapter we use the concept of a copula to look more closely at the issue of 
modelling a random vector of dependent financial risk factors. In Section 7.1 we 
define copulas, give a number of examples and establish their basic properties. 

Dependence concepts and dependence measures are considered in Section 7.2, 
beginning with the notion of perfect positive dependence or comonotonicity. This 
is a very important concept in risk management because it formalizes the idea 
of undiversifiable risks and therefore has important implications for determining 
risk-based capital. Dependence measures provide a scalar-valued summary of the 
strength of dependence between risks and there are many different measures; we 
consider linear correlation and two further classes of measures—rank correlations 
and coefficients of tail dependence—that can be directly related to copulas. 

Linear correlation is a standard measure for describing the dependence between 
financial assets but it has a number of limitations, particularly when we leave the 
multivariate normal and elliptical distributions of Chapter 6 behind. Rank corre- 
lations are mainly used to calibrate copulas to data, while tail dependence is an 
important theoretical concept, since it addresses the phenomenon of joint extreme 
values in several risk factors, which is one of the major concerns in financial risk 
management (see also Section 3.2). 

In Section 7.3 we look in more detail at the copulas of normal mixture distribu- 
tions; these are the copulas that are used implicitly when normal mixture distribu- 
tions are fitted to multivariate risk-factor change data, as in Chapter 6. In Section 7.4 
we consider Archimedean copulas, which are widely used as dependence models 
in low-dimensional applications and which have also found an important niche in 
portfolio credit risk modelling, as will be seen in Chapters 11 and 12. The chapter 
ends with a section on fitting copulas to data. 


7.1 Copulas 


In a sense, every joint distribution function for a random vector of risk factors 
implicitly contains both a description of the marginal behaviour of individual risk 
factors and a description of their dependence structure; the copula approach provides 
a way of isolating the description of the dependence structure. We view copulas as 
an extremely useful concept and see several advantages in introducing and studying 
them. 
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First, copulas help in the understanding of dependence at a deeper level. They 
allow us to see the potential pitfalls of approaches to dependence that focus only 
on correlation and show us how to define a number of useful alternative depend- 
ence measures. Copulas express dependence on a quantile scale, which is use- 
ful for describing the dependence of extreme outcomes and is natural in a risk- 
management context, where VaR has led us to think of risk in terms of quantiles of 
loss distributions. 

Moreover, copulas facilitate a bottom-up approach to multivariate model build- 
ing. This is particularly useful in risk management, where we very often have a 
much better idea about the marginal behaviour of individual risk factors than we do 
about their dependence structure. An example is furnished by credit risk, where the 
individual default risk of an obligor, while in itself difficult to estimate, is at least 
something we can get a better handle on than the dependence among default risks 
for several obligors. The copula approach allows us to combine our more developed 
marginal models with a variety of possible dependence models and to investigate 
the sensitivity of risk to the dependence specification. Since the copulas we present 
are easily simulated, they lend themselves particularly well to Monte Carlo studies 
of risk. Of course, while the flexibility of the copula approach allows us, in theory, to 
build an unlimited number of models with given marginal distributions, we should 
stress that it is important to have a good understanding of the behaviour of different 
copulas and their appropriateness for particular kinds of modelling application. 


7.1.1 Basic Properties 


Definition 7.1 (copula). A d-dimensional copula is a distribution function on [0, 1]? 
with standard uniform marginal distributions. 


We reserve the notation C(u) = C(u,,...,uq) for the multivariate dfs that are 
copulas. Hence C is a mapping of the form C: [0, 1]“ — [0, 1], i.e. a mapping of 
the unit hypercube into the unit interval. The following three properties must hold. 


(1) C(u1,..., ua) = 0 if u; = 0 for any i. 
(2) Cd,...,1,uj,1,...,1) =u; foralli € {1,..., d}, u; € [0, 1]. 
(3) For all (a1,..., aq), (b1,..., bq) € [0, 14 with a; < b; we have 


2 2 
Soe Dll CG os , Udig) > 0, (7.1) 
ij=l ig=1 
where uj; = aj anduj2 = bj forall j € {1,..., d}. 


Note that the second property corresponds to the requirement that marginal distri- 
butions are uniform. The so-called rectangle inequality in (7.1) ensures that if the 
random vector (U1, ..., Ug)’ has df C, then P (a} < U1 < b1, ...,aq < Ug < ba) 
is non-negative. These three properties characterize a copula; if a function C fulfills 
them, then it is a copula. Note also that, for 2 < k < d, the k-dimensional margins 
of a d-dimensional copula are themselves copulas. 
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Some preliminaries. In working with copulas we must be familiar with the opera- 
tions of probability and quantile transformation, as well as the properties of gener- 
alized inverses, which are summarized in Section A.1.2. The following elementary 
proposition is found in many probability texts. 


Proposition 7.2. Let F be a distribution function and let F© denote its generalized 
inverse, i.e. the function F< (u) = inf{x: F(x) > u}. 


(1) Quantile transform. If U ~ U(0, 1) has a standard uniform distribution, then 
P(FO(U) < x) = F(x). 
(2) Probability transform. If X has df F, where F is a continuous univariate df, 
then F(X) ~ U(O, 1). 
Proof. Let x € R and u € (0, 1). For the first part use the fact that 
F(x)>u 4> FCM) <x 
(see Proposition A.3 (iv) in Section A.1.2), from which it follows that 
P(F* (U) < x) = PU < F(x)) = F(x). 
For the second part we infer that 
P(F(X) < u) = P(F* o F(X) < F“ (u)) 
= P(X < F4 (u)) = Fo F| (u) 
=u, 
where the first inequality follows from the fact that F © is strictly increasing (Propo- 


sition A.3 (ii)), the second follows from Proposition A.4, and the final equality is 
Proposition A.3 (viii). 


Proposition 7.2 (1) is the key to stochastic simulation. If we can generate a uniform 
variate U and compute the inverse of a df F, then we can sample from that df. Both 
parts of the proposition taken together imply that we can transform risks with a 
particular continuous df to have any other continuous distribution. For example, if 
X has a standard normal distribution, then @(X) is uniform by Proposition 7.2 (2), 
and, since the quantile function of a standard exponential df G is G~ (u) = — Ini — 
u), the transformed variable Y := —In(1 — ®(X)) has a standard exponential 
distribution by Proposition 7.2 (1). 


Sklar’s Theorem. The importance of copulas in the study of multivariate distribu- 
tion functions is summarized by the following elegant theorem, which shows, firstly, 
that all multivariate dfs contain copulas and, secondly, that copulas may be used in 
conjunction with univariate dfs to construct multivariate dfs. 


Theorem 7.3 (Sklar 1959). Let F be a joint distribution function with margins 
F,..., Fa. Then there exists a copula C: [0, 1]¢ — [0,1] such that, for all 
x1,...,Xq in R = [—oo, ov], 


F(x1,...,X%d) = C(Fi(x1), ..., Fa(xa)). (7.2) 
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If the margins are continuous, then C is unique; otherwise C is uniquely determined 
on Ran F; x Ran F> x --- x Ran Fy, where Ran F; = F; (R) denotes the range of F;. 
Conversely, if C is a copula and F\,..., Fg are univariate distribution functions, 
then the function F defined in (7.2) is a joint distribution function with margins 
Fissures Ld: 


Proof. We prove the existence and uniqueness of a copula in the case when 
Fi, ..., Fq are continuous and the converse statement in its general form. Remark 
7.4 explains how the general result may be proved with the more complicated dis- 
tributional transform, which is given in Appendix A.1.3. 

Let X be a random vector with df F and continuous margins F),..., F4, and, 
fori = 1,...,d, set U; = F;(X;). By Proposition 7.2 (2), U; ~ U(O, 1), and, by 
Proposition A.4 in the appendix, F“ (U;) = X;, almost surely. Let C denote the 


distribution function of (U1, ..., Ua), which is a copula by Definition 7.1. For any 
X1,...,Xq in R= [—0o, oo] we infer, using Proposition A.3 (iv), that 
F(x1,..., X4) = P(X, S x1, ..., Xa S Xa) (7.3) 


= P(FÉ (U1) < x1, ..., Fi (Ua) <S xa) 
= PU, S Fi (xı), ..., Ua S F(xa)) 
= C(F\(x1),..., Fa(xa)), 


and thus we obtain the identity (7.2). 
If we evaluate (7.2) at the arguments x; = F< (ui), Os up SLi S ys eds 
and use Proposition A.3 (viii), we obtain 


C(ui,..., Ud) =F Cy (u1), ..., F (ua)), (7.4) 


which gives an explicit representation of C in terms of F and its margins, and thus 
shows uniqueness. 

For the converse statement assume that C is a copula and that F3, ..., Fy are 
arbitrary univariate dfs. We construct a random vector with df (7.2) by taking U to 
be any random vector with df C and setting X := (FĂ (U1), ..., Fý (Ua)). We 
can then follow exactly the same sequence of equations commencing with (7.3) to 
establish that the df of X satisfies (7.2). 


Remark 7.4. The general form of Sklar’s Theorem can be proved by using the 
distributional transform in Appendix A.1.3 instead of the probability transform. For 


a random vector X with arbitrary df F and margins F}, ..., Fg we can set U; = 
F; (Xi, Vi), where F; is the modified distribution function of X; defined in (A.6) and 
Vi, ..., Va are uniform rvs that are independent of X1, ..., Xa. Proposition A.6 


shows that U; ~ U(0,1) and F< (Ui) = X;, almost surely, so an otherwise- 
identical proof may be used. The non-uniqueness of the copula is related to the fact 
that there are different ways of choosing the V; variables; they need not themselves 
be independent and could in fact be identical for all i. 
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Formulas (7.2) and (7.4) are fundamental in dealing with copulas. The former 
shows how joint distributions F are formed by coupling together marginal distribu- 
tions with copulas C; the latter shows how copulas are extracted from multivariate 
dfs with continuous margins. Moreover, (7.4) shows how copulas express depend- 
ence on a quantile scale, since the value C (u1, ..., uq) is the joint probability that 
Xı lies below its u1-quantile, X2 lies below its w2-quantile, and so on. Sklar’s The- 
orem also suggests that, in the case of continuous margins, it is natural to define the 
notion of the copula of a distribution. 


Definition 7.5 (copula of F). If the random vector X has joint df F with contin- 
uous marginal distributions F), ..., Fg, then the copula of F (or X) is the df C of 
(Fi (X41), ..-, Fa(Xa)). 


Discrete distributions. The copula concept is slightly less natural for multivariate 
discrete distributions. This is because there is more than one copula that can be used 
to join the margins to form the joint df, as the following example shows. 


Example 7.6 (copulas of bivariate Bernoulli). Let (X1, X2) have a bivariate 
Bernoulli distribution satisfying 
P(X, =0,X,=0)=§, P(X1=1,%.=1)= 
PX HONS as Pal, oS OS 
3 
8 


’ 


COIN olw 


Clearly, P(X; = 0) = P(X2 = 0) = š and the marginal distributions F; and F of 
X, and X2 are the same. From Sklar’s Theorem we know that 


P(X] < x1, X2 < x2) = C(P(X <S x1), P(X2 <S x2)) 


for all x1, x2 and some copula C. Since Ran F; = Ran Fz = {0, 3, 1}, clearly the 
only constraint on C is that C G, 3) = ż. Any copula fulfilling this constraint is a 
copula of (X1, X2), and there are infinitely many such copulas. 


Invariance. A useful property of the copula of a distribution is its invariance under 
strictly increasing transformations of the marginals. In view of Sklar’s Theorem and 
this invariance property, we interpret the copula of a distribution as a very natural 
way of representing the dependence structure of that distribution, certainly in the 
case of continuous margins. 


Proposition 7.7. Let (X,,...,Xq) be a random vector with continuous mar- 
gins and copula C and let Ti,...,Tq be strictly increasing functions. Then 
(Tı(Xı), ..., Ta(Xa)) also has copula C. 


Proof. We first note that (Tı (X1), ..., Ta(Xq)) is also a random vector with contin- 
uous margins and that it will have the same distribution regardless of whether each T; 
is left continuous or right continuous at its (countably many) points of discontinuity. 
By starting with the expression (7.4) for the unique copula C of (X1, ..., Xa), using 
the fact that {X; < x} = {T; (Xi) < T; (x)} for a strictly increasing transformation 
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T; and then applying Proposition A.5 for left-continuous transformations, we obtain 
C(u1,..., ud) = P(X1 < FÉ (u1), ..., Xa < Fy (ua)) 
= P(Ti(X1) <S Tı o FÉ (u1), .-., Ta(Xa) S Ta o Fy (ua)) 
= P(T, (X1) < Ffy), Ta(Xa) < FE ta), 


which shows that C is also the unique copula of (Tı (X1), ..., Ta(Xa)). 


Fréchet bounds. We close this section by establishing the important Fréchet 
bounds for copulas, which turn out to have important dependence interpretations 
that are discussed further in Sections 7.1.2 and 7.2.1. 


Theorem 7.8. For every copula C (u1, ..., uq) we have the bounds 
d 
max (You +1 ~4,0) < C(u) < min(y,..., ug). (7.5) 


i=l 
Proof. The second inequality follows from the fact that, for all i, 
N {Uj < uj} C {U; <S ui}. 
l<j<d 
For the first inequality observe that 


N Wi <u) = 1 U (U >u) 


I<i<d I<i<d 


C(u) = ( 


d d 
>1- YPU; >u)=1-d+Y ui. 
i=l i=l 


The lower and upper bounds will be given the notation W(u1,...,uq) and 
M (u1, ... , Ud), respectively. 


Remark 7.9. Although we give Fréchet bounds for a copula, Fréchet bounds may 
be given for any multivariate df. For a multivariate df F with margins F),..., Fa 
we establish by similar reasoning that 
d 
max (2 Fi(xi) + 1—d, 0) < F(x) < min(Fi(x1),..., Fava), (1.6) 
i=1 


so we have bounds for F in terms of its own marginal distributions. 


7.1.2 Examples of Copulas 


We provide a number of examples of copulas in this section and these are subdivided 
into three categories: fundamental copulas represent a number of important special 
dependence structures; implicit copulas are extracted from well-known multivariate 
distributions using Sklar’s Theorem, but do not necessarily possess simple closed- 
form expressions; explicit copulas have simple closed-form expressions and follow 


226 7. Copulas and Dependence 


mathematical constructions known to yield copulas. Note, however, that implicit 
and explicit are not mutually exclusive categories, and some copulas may have both 
implicit and explicit representations, as shown later in Example 7.14. 


Fundamental copulas. The independence copula is 


d 
M(uy,...,ua) = | [ ui. (7.7) 
i=l 


It is clear from Sklar’s Theorem, and equation (7.2) in particular, that rvs with 
continuous distributions are independent if and only if their dependence structure is 
given by (7.7). 

The comonotonicity copula is the Fréchet upper bound copula from (7.5): 


M(uy4,...,Ud) = min(u1,..., Ud). (7.8) 


Observe that this special copula is the joint df of the random vector (U, ..., U), 
where U ~ U(0, 1). Suppose that the rvs X;,..., Xq have continuous dfs and are 
perfectly positively dependent in the sense that they are almost surely strictly increas- 
ing functions of each other, so that X; = 7;(Xj) almost surely fori = 2,...,d. 
By Proposition 7.7, the copula of (X1,..., Xa) and the copula of (Xj,..., X1) 
are the same. But the copula of (X1,..., X1) is just the df of (U, ..., U), where 
U = F,(X}), i.e. the copula (7.8). 

The countermonotonicity copula is the two-dimensional Fréchet lower bound 
copula from (7.5) given by 


W (u1, u2) = max (u1 + u2 — 1,0). (7.9) 


This copula is the joint df of the random vector (U, 1 — U), where U ~ U (0, 1). 
If Xı and Xz have continuous dfs and are perfectly negatively dependent in the 
sense that X2 is almost surely a strictly decreasing function of X1, then (7.9) is their 
copula. 

We discuss both perfect positive and perfect negative dependence in more detail 
in Section 7.2.1, where we see that an extension of the countermonotonicity concept 
to dimensions higher than two is not possible. 

Perspective pictures and contour plots for the three fundamental copulas are given 
in Figure 7.1. The Fréchet bounds (7.5) imply that all bivariate copulas lie between 
the surfaces in (a) and (c). 


Implicit copulas. If Y ~ Na(m, X) is a multivariate normal random vector, then 
its copula is a so-called Gauss copula (or Gaussian copula). Since the operation 
of standardizing the margins amounts to applying a series of strictly increasing 
transformations, Proposition 7.7 implies that the copula of Y is exactly the same as 
the copula of X ~ N4(0, P), where P = (X) is the correlation matrix of Y. By 
Definition 7.5 this copula is given by 


C(u) = P(®(X1) < u1,..., (Xa) < ua) 
= p(T! (u1), ..., PT! (ua)), (7.10) 
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Figure 7.1. (a)—(c) Perspective plots and (d)-(f) contour plots of the three fundamental 
copulas: (a), (d) countermonotonicity, (b), (e) independence and (c), (f) comonotonicity. 
Note that these are plots of distribution functions. 


where ® denotes the standard univariate normal df and ®p denotes the joint df of 
X. The notation C Sa emphasizes that the copula is parametrized by the 5d (d — 1) 
parameters of the correlation matrix; in two dimensions we write C ae where 
p = p(X, X2). 

The Gauss copula does not have a simple closed form, but can be expressed as an 
integral over the density of X; in two dimensions for |p| < 1 we have, using (7.10), 
that 


CO" (uy, U2) 


Pw) poiu) 1 —(s? — 2ps1s2 + 83) 
= ETEN exp 3 ds; dsp. 
—0o —0o 27 (1 — p4) 2(1 — p7) 


Note that both the independence and comonotonicity copulas are special cases of 
the Gauss copula. If P = Ig, we obtain the independence copula (7.7); if P = Ja, 
the d x d matrix consisting entirely of ones, then we obtain the comonotonicity 
copula (7.8). Also, for d = 2 and p = —1 the Gauss copula is equal to the counter- 
monotonicity copula (7.9). Thus in two dimensions the Gauss copula can be thought 
of as a dependence structure that interpolates between perfect positive and negative 
dependence, where the parameter p represents the strength of dependence. 

Perspective plots and contour lines of the bivariate Gauss copula with p = 0.7 
are shown in Figure 7.2 (a),(c); these may be compared with the contour lines of 
the independence and perfect dependence copulas in Figure 7.1. Note that these 
pictures show contour lines of distribution functions and not densities; a picture of 
the Gauss copula density is given in Figure 7.5. 

In the same way that we can extract a copula from the multivariate normal distri- 
bution, we can extract an implicit copula from any other distribution with continuous 
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Figure 7.2. (a), (b) Perspective plots and (c), (d) contour plots of the Gaussian and Gumbel 
copulas, with parameters ọ = 0.7 and 0 = 2, respectively. Note that these are plots of 
distribution functions; a picture of the Gauss copula density is given in Figure 7.5. 


marginal dfs. For example, the d-dimensional t copula takes the form 


C! pu) = ty p(t" (u1), , t; (ua)), (7.11) 


where t, is the df of a standard univariate ¢ distribution with v degrees of freedom, 
t, p is the joint df of the vector X ~ ta(v, 0, P), and P is a correlation matrix. 
As in the case of the Gauss copula, if P = Jy then we obtain the comonotonicity 
copula (7.8). However, in contrast to the Gauss copula, if P = Ig we do not obtain 
the independence copula (assuming v < oo) since uncorrelated multivariate t- 
distributed rvs are not independent (see Lemma 6.5). 


Explicit copulas. While the Gauss and t copulas are copulas implied by well- 
known multivariate dfs and do not themselves have simple closed forms, we can 
write down a number of copulas that do have simple closed forms. An example is 
the bivariate Gumbel copula: 


CE" (u1, u2) = exp(—((— In u1)? + (—Inu2)®) P), 1<0<0%. (7.12) 


If 0 = 1 we obtain the independence copula as a special case, and the limit of C ou 
as 0 —> oo is the two-dimensional comonotonicity copula. Thus the Gumbel copula 
interpolates between independence and perfect dependence and the parameter 6 
represents the strength of dependence. Perspective plot and contour lines for the 
Gumbel copula with parameter 6 = 2 are shown in Figure 7.2 (b),(d). They appear 
to be very similar to the picture for the Gauss copula, but Example 7.13 will show 
that the Gaussian and Gumbel dependence structures are quite different. 
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A further example is the bivariate Clayton copula: 
CE a1, u2) = uy? + uz? — 17", 0< 6 <o. (7.13) 


The case 6 = 0 should be interpreted as the limit of (7.13) as 9 — 0, which is the 
independence copula; as 0 — oo we approach the two-dimensional comonotonicity 
copula. 

The Gumbel and Clayton copulas belong to the Archimedean copula family and 
we provide more discussion of this family in Sections 7.4 and 15.2. 


7.1.3 Meta Distributions 


The converse statement of Sklar’s Theorem provides a very powerful technique 
for constructing multivariate distributions with arbitrary margins and copulas; we 
know that if we start with a copula C and margins F),..., Fg, then F(x) := 
C(F\(x1),.--, Fa(%q)) defines a multivariate df with margins F),..., Fg. 

Consider, for example, building a distribution with the Gauss copula C ga but arbi- 
trary margins; such a model is sometimes called a meta-Gaussian distribution. We 
extend the meta terminology to other distributions, so, for example, a meta-t, distri- 
bution has the copula C 4 p and arbitrary margins, and a meta-Clayton distribution 
has the Clayton copula and arbitrary margins. 


7.1.4 Simulation of Copulas and Meta Distributions 


It should be apparent from the way the implicit copulas in Section 7.1.2 were 
extracted from well-known distributions that it is particularly easy to sample from 
these copulas, provided we can sample from the distribution from which they are 
extracted. The steps are summarized in the following algorithm. 


Algorithm 7.10 (simulation of implicit copulas). 


(1) Generate X ~ F, where F is a df with continuous margins F}, ..., Fa. 


(2) Return U = (F{(X1),..., Fa(Xq))’. The random vector U has df C, where 
C is the unique copula of F. 


Particular examples are given in the following algorithms. 
Algorithm 7.11 (simulation of Gauss copula). 
(1) Generate Z ~ Nq(0, P) using Algorithm 6.2. 


(2) Return U = (@(Z1),..., ®(Zz))’, where @ is the standard normal df. The 
random vector U has df C ie 


Algorithm 7.12 (simulation of t copula). 
(1) Generate X ~ tg(v, 0, P) using Algorithm 6.10. 


(2) Return U = (t,(X1),..., ty(Xq))’, where ty denotes the df of a standard 
univariate f¢ distribution. The random vector U has df C g p- 
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Figure 7.3. Two thousand simulated points from the (a) Gaussian, (b) Gumbel, 
(c) Clayton and (d) t copulas. See Example 7.13 for parameter choices and interpretation. 


The Clayton and Gumbel copulas present slightly more challenging simulation 
problems and we give algorithms in Section 7.4 after looking at the structure of these 
copulas in more detail. These algorithms will, however, be used in Example 7.13 
below. 

Assume that the problem of generating realizations U from a particular copula has 
been solved. The converse of Sklar’s Theorem shows us how we can sample from 
meta distributions that combine this copula with an arbitrary choice of marginal 
distribution. If U has df C, then we use quantile transformation to obtain X := 
(FĂ (U1), sas F$ (Ua))’, which is a random vector with margins F),..., Fg and 
multivariate df C (Fı (x1), ..., Fa(xa)). This technique is extremely useful in Monte 
Carlo studies of risk and will be discussed further in the context of Example 7.58. 


Example 7.13 (various copulas compared). In Figure 7.3 we show 2000 simu- 
lated points from four copulas: the Gauss copula (7.10) with parameter p = 0.7; 
the Gumbel copula (7.12) with parameter 0 = 2; the Clayton copula (7.13) with 
parameter 0 = 2.2; and the t copula (7.11) with parameters v = 4 and p = 0.71. 
In Figure 7.4 we transform these points componentwise using the quantile func- 
tion of the standard normal distribution to get realizations from four different meta 
distributions with standard normal margins. The Gaussian picture shows data gen- 
erated from a standard bivariate normal distribution with correlation 70%. The 
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Figure 7.4. Two thousand simulated points from four distributions with standard normal 
margins, constructed using the copula data from Figure 7.3 ((a) Gaussian, (b) Gumbel, 
(c) Clayton and (d) t). The Gaussian picture shows points from a standard bivariate normal 
with correlation 70%; other pictures show distributions with non-Gauss copulas constructed 
to have a linear correlation of roughly 70%. See Example 7.13 for parameter choices and 
interpretation. 


other pictures show data generated from unusual distributions that have been cre- 
ated using the converse of Sklar’s Theorem; the parameters of the copulas have 
been chosen so that all of these distributions have a linear correlation that is 
roughly 70%. 

Considering the Gumbel picture, these are bivariate data with a meta-Gumbel 
distribution with df cH (p (x1), ®(x2)), where 0 = 2. The Gumbel copula causes 
this distribution to have upper tail dependence, a concept defined formally in Sec- 
tion 7.2.4. Roughly speaking, there is much more of a tendency for X3 to be extreme 
when X, is extreme, and vice versa, a phenomenon that would obviously be wor- 
rying when X; and X2 are interpreted as potential financial losses. The Clayton 
copula turns out to have lower tail dependence, and the t copula to have both lower 
and upper tail dependence; in contrast, the Gauss copula does not have tail depend- 
ence and this can also be glimpsed in Figure 7.2. In the upper-right-hand corner 
the contours of the Gauss copula are more like those of the independence copula of 
Figure 7.1 than the perfect dependence copula. 
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Note that the qualitative differences between the distributions are explained by 
the copula alone; we can construct similar pictures where the marginal distributions 
are exponential or Student f, or any other univariate distribution. 


7.1.5 Further Properties of Copulas 


Survival copulas. A version of Sklar’s identity (7.2) also applies to multivariate sur- 
vival functions of distributions. Let X be a random vector with multivariate survival 


function F , marginal dfs F),..., Fg and marginal survival functions F ios Fy, 
i.e. F; = 1 — Fj. We have the identity 

F(a,...,%4) = ÔC @1), ..., Fa(xa)) (7.14) 
for a copula C , which is known as a survival copula. In the case where F,..., Fg 


are continuous this identity is easily established by noting that 
F(x,...,X¢) = P(X, > x1,..., Xd > Xa) 
= P(1 — Fi(X1) < Fi(ai),..., 1 — Fa(Xa) < Fa(xa)), 
so (7.14) follows by writing C for the distribution function of 1 — U, where U := 
(F\(X1),..., Fy(Xq)) and 1is the vector of ones in R@. In general, the term survival 
copula of a copula C will be used to denote the df of 1 — U when U has df C. 
In the case where F4, ..., Fg are continuous and strictly increasing we can give 
a representation for C in (7.14) by setting x; = F7! (ui) fori = 1,...,d to obtain 
Ĉu, ..., ua) = FCF, ' u), ..., E7 (ua))- (7.15) 
The next example illustrates the derivation of a survival copula from a bivariate 


survival function using (7.15). 


Example 7.14 (survival copula of a bivariate Pareto distribution). A well-known 
generalization of the important univariate Pareto distribution is the bivariate Pareto 
distribution with survivor function given by 


xi tky | X2+k2 
+ 


—a 
Pana = ( i) » X1,x2 20, a, K1, K2 > Q. 


K1 K2 
It is easily confirmed that the marginal survivor functions are given by F; (x) = 
(ki/(ki +x))%, i = 1, 2, and we then infer using (7.15) that the survival copula is 
given by Cu, u2) = Gin + Uy TN e: Comparison with (7.13) reveals that 
this is the Clayton copula with parameter 0 = 1/a. 


The useful concept of radial symmetry can be expressed in terms of copulas and 
survival copulas. 


Definition 7.15 (radial symmetry). A random vector X (or its df) is radially sym- 
metric about a point a if X — a =a — X. 


An elliptical random vector X ~ Eq(m, X, Y) is obviously radially symmetric 
about u. If U has df C, where C is a copula, then the only possible centre of 
symmetry is (0.5, ...,0.5), so C is radially symmetric if 


(Ui —0.5,..., Ua — 0.5) È (0.5 — he 0S U) 4> Ea. 
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Thus if a copula C is radially symmetric and C is its survival copula, we have C=C. 
It is easily seen that the copulas of elliptical distributions are radially symmetric but 
the Gumbel and Clayton copulas are not. 

Survival copulas should not be confused with the survival functions of copulas, 
which are not themselves copulas. Since copulas are simply multivariate dfs, they 
have survival or tail functions, which we denote by C. If U has df C and the survival 
copula of C is C , then 


C(uj,..., ug) = P(U; > u,..., Ug > ug) 
=PA-U,<1—w,...,1—Ug S 1 — Ug) 
= Ĉ(1 — u1, ..., 1 — ua). 
A useful relationship between a copula and its survival copula in the bivariate case 


is that 
C(1 — uj, 1 — u2) = 1 — uj — u) + C (u1, u2). (7.16) 


Conditional distributions of copulas. It is often of interest to look at condi- 
tional distributions of copulas. We concentrate on two dimensions and suppose 
that (U1, U2) has df C. Since a copula is an increasing continuous function in each 
argument, 


C(u + ô, u2) — C (u1, u2) 
ô 


Cuu, (u2 | u1) = P(U2 < u2 | Ui = u1) = im 


0 
= — C (u1, u2), (7.17) 
ðu 


where this partial derivative exists almost everywhere (see Nelsen (2006) for precise 
details). The conditional distribution is a distribution on the interval [0, 1] that is 
only a uniform distribution in the case where C is the independence copula. A risk- 
management interpretation of the conditional distribution is the following. Suppose 
continuous risks (X1, X2) have the (unique) copula C. Then 1 — Cy,\y, (q | p) is 
the probability that X2 exceeds its qth quantile given that X, attains its pth quantile. 


Copula densities. Copulas do not always have joint densities; the comonotonicity 
and countermonotonicity copulas are examples of copulas that are not absolutely 
continuous. However, the parametric copulas that we have met so far do have den- 
sities given by 

CU a (7.18) 

Ou, --- dUd 

and we are sometimes required to calculate them, e.g. if we wish to fit copulas to 
data by maximum likelihood. 


It is useful to note that, for the implicit copula of an absolutely continuous joint df 


F with strictly increasing, continuous marginal dfs F),..., Fa, we may differentiate 
C(u1,..., ud) = F(FE (u1),..., F (ua)) to see that the copula density is given 
by 


f(T u), ..., Fy! ua)) 
fiCF, (a1) -fa (F7 | (ua)) 


c(uj,...,Ug) = (7.19) 
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Figure 7.5. Perspective plot of the density of the bivariate 
Gauss copula with parameter p = 0.3. 
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Figure 7.6. Perspective plot of the density of the bivariate 
t copula with parameters v = 4 and p = 0.3. 


where f is the joint density of F, fi,..., fa are the marginal densities, and 
Fi Pere d l are the ordinary inverses of the marginal dfs. 

Using this technique we can calculate the densities of the Gaussian and t copulas 
as shown in Figures 7.5 and 7.6, respectively. Observe that the t copula assigns much 
more probability mass to the corners of the unit square; this may be explained by 
the tail dependence of the t copula, as discussed in Section 7.2.4. 


Exchangeability. 


Definition 7.16 (exchangeability). A random vector X is exchangeable if 


d 
(X1,..., Xa) = (Xr), ---, Xa) 
for any permutation (/7(1),..., H(d)) of (1,..., d). 


We will refer to a copula as an exchangeable copula if it is the df of an exchangeable 
random vector of uniform variates U. Clearly, for such a copula we must have 


C(u1,..., ud) = Cun), ---, uma) (7.20) 


for all possible permutations of the arguments of C. Such copulas will prove useful 
in modelling the default dependence for homogeneous groups of companies in the 
context of credit risk. 


7.2. Dependence Concepts and Measures 235 


Examples of exchangeable copulas include both the Gumbel and Clayton copulas 
as well as the Gaussian and t copulas, Ce and C P in the case where P is an 
equicorrelation matrix, i.e. a matrix of the form P = pJg + (1 — p)Ia, where Ja is 
the square matrix consisting entirely of ones and p > —1/(d — 1). 

It follows from (7.20) and (7.17) that if the df of the vector (U1, U2) is an exchange- 
able bivariate copula, then 


P(U2 < u2 | Ui = u1) = P(U, S u | U2 = u1), (7.21) 


which implies quite strong symmetry. If a random vector (X1, X2) has such a copula, 
then the probability that X2 exceeds its u2-quantile given that X; attains its u1- 
quantile is exactly the same as the probability that X; exceeds its u2-quantile given 
that X2 attains its w,-quantile. Not all bivariate copulas must satisfy (7.21). For an 
example of a non-exchangeable bivariate copula see Section 15.2.2 and Figure 15.4. 


Notes and Comments 


Sklar’s Theorem is first found in Sklar (1959); see also Schweizer and Sklar (1983) 
or Nelsen (2006) for a proof of the result. The elegant proof using the distributional 
transform, as mentioned in Remark 7.4, is due to Ruschendorf (2009). A system- 
atic development of the theory of copulas, particularly bivariate ones, with many 
examples is found in Nelsen (2006). Pitfalls related to discontinuity of marginal 
distributions are presented in Marshall (1996), and a primer on copulas for dis- 
crete count data is given in Genest and NeSlehova (2007). For extensive lists of 
parametric copula families see Hutchinson and Lai (1990), Joe (1997) and Nelsen 
(2006). 

A reference on copula methods in finance is Cherubini, Luciano and Vecchiato 
(2004). Embrechts (2009) contains some references to the discussion concerning 
the pros and cons of copula modelling in insurance and finance. 


7.2 Dependence Concepts and Measures 


In this section we first provide formal definitions of the concepts of perfect positive 
and negative dependence (comonotonicity and countermonotonicity) and we present 
some of the properties of perfectly dependent random vectors. 

We then focus on three kinds of dependence measure: the usual Pearson linear 
correlation, rank correlation, and the coefficients of tail dependence. All of these 
dependence measures yield a scalar measurement of the “strength of the depend- 
ence” for a pair of rvs (X1, X2), although the nature and properties of the measure 
are different in each case. 

The rank correlations and tail-dependence coefficients are copula-based depend- 
ence measures. In contrast to ordinary correlation, these measures are functions of 
the copula only and can thus be used in the parametrization of copulas, as discussed 
in Section 7.5. 
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7.2.1 Perfect Dependence 


We define the concepts of perfect positive and perfect negative dependence using 
the fundamental copulas of Section 7.1.2. Alternative names for these concepts are 
comonotonicity and countermonotonicity, respectively. 


Comonotonicity. This concept may be defined in a number of equivalent ways. We 
give a copula-based definition and then derive alternative representations that show 
that comonotonic random variables can be thought of as undiversifiable random 
variables. 


Definition 7.17 (comonotonicity). The rvs X1, ..., Xq are said to be comonotonic 
if they admit as copula the Fréchet upper bound M (u1, ... , uq) = min(w1,..., Ud). 


The following result shows that comonotonic rvs are monotonically increasing 
functions of a single rv. In other words, there is a single source of risk and the 
comonotonic variables move deterministically in lockstep with that risk. 


Proposition 7.18. X1, ..., Xq are comonotonic if and only if 
d 
(X1,..., Xa) = (Ui(Z), ..-, va(Z)) (7.22) 
for some rv Z and increasing functions v1, ..., Vq. 
Proof. Assume that X1,..., Xq are comonotonic according to Definition 7.2.1. Let 
U be any uniform rv and write F, F, ..., Fg for the joint df and marginal dfs of 
X1,..., Xq, respectively. From (7.2) we have 


F(x1,...,%q) = min(F} (x1), ..., Fa(xa)) 
= PU < min(Fi (41), ..., Fa(xa))) 
= PU S Fi(%),...,U S Fa(xa)) 
= P(FÝ (U) <m1,..., Fy (U) < xa) 
for any U ~ U(0, 1), where we use Proposition A.3 (iv) in the last equality. It 


follows that i 
(X1,...,Xa) = (FÉ (U), ..., F} (U)), (7.23) 


which is of the form (7.22). Conversely, if (7.22) holds, then 
F(x1, ..., X4) = P (1 (Z) S x1, ..., Vg(Z) S xa) = P(Z € Aj, ..., Z € Aa), 


where each A; is an interval of the form (—oo, ki] or (—ooọ, ki), so one interval A; 
is a subset of all other intervals. Therefore, 


F(x,...,Xq) =min(P(Z € Aj1),...,P(Z E€ Aa)) 
= min(Fi (x1), ..., Fa(xa)), 


which proves comonotonicity. 


In the case of rvs with continuous marginal distributions we have a simpler and 
stronger result. 
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Corollary 7.19. Let X,,..., Xq be rvs with continuous dfs. They are comonotonic 
if and only if for every pair (i, j) we have Xj = Tji(Xi) almost surely for some 
increasing transformation T;;. 


Proof. The result follows from the proof of Proposition 7.18 by noting that the rv 
U may be taken to be F;(X;) for any i. Without loss of generality set d = 2 and 
i = | and use (7.23) and Proposition A.4 to obtain 


(X1, X2) Å (FE o F (X1), FS o F\(X1)) È (X1, ES o Fi (X))). 


Comonotone additivity of quantiles. A very important result for comonotonic rvs 
is the additivity of the quantile function as shown in the following proposition. In 
addition to the VaR risk measure, the property of so-called comonotone additivity 
will be shown to apply to a class of risk measures known as distortion risk measures 
in Section 8.2.1; this class includes expected shortfall. 


Proposition 7.20. Let0 < æ < | and X1, ..., Xq be comonotonic rvs with dfs 
Fi, ie Fy. Then 


Fy y4x,@ = FÉ @) +--+ + Fy (@). (7.24) 


Proof. For ease of notation take d = 2. From Proposition 7.18 we have that 
(X1, X2) = (FÉ (U), Fs (U)) for some U ~ U (0, 1). It follows that 


FY +x, (œ) = Fry) (@), 


where T is the increasing left-continuous function given by T(x) = F ia (x) + 
F (x). The result follows by applying Proposition A.5 to get 


Fry) (@) = T (Fg (@)) = T@) = FÉ (œ) + Fy @). 


Countermonotonicity. In an analogous manner to the way we have defined 
comonotonicity, we define countermonotonicity as acopula concept, albeit restricted 
to the case d = 2. 


Definition 7.21 (countermonotonicity). The rvs X; and X2 are countermonotonic 
if they have as copula the Fréchet lower bound W (u1, u2) = max(u; + u2 — 1,0). 


Proposition 7.22. Xı and Xz are countermonotonic if and only if 
d 
(X1, X2) = (v1 (Z), v2(Z)) 
for some rv Z with vı increasing and v2 decreasing, or vice versa. 


Proof. The proof is similar to that of Proposition 7.18 and is given in Embrechts, 
McNeil and Straumann (2002). 
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Remark 7.23. In the case where X; and X2 are continuous we have the simpler 
result that countermonotonicity is equivalent to X2 = T (X1) almost surely for some 
decreasing function T. 


The concept of countermonotonicity does not generalize to higher dimensions. 
The Fréchet lower bound W(u1,...,uq) is not itself a copula for d > 2 since it 
is not a proper distribution function and does not satisfy (7.1), as the following 
example taken from Nelsen (2006, Exercise 2.36) shows. 


Example 7.24 (the Fréchet lower bound is not a copula for d > 2). Consider 
the d-cube [5 1]? c [0, 1% If the Fréchet lower bound for copulas were a df on 
[0, 1]7, then (7.1) would imply that the probability mass P (d) of this cube would 
be given by 


P(d) = max(1+---+1—d+1,0)—dmax(+1+---+1-d+1,0) 
d 
+ (P) + fo FI d + 1,0) —- 


+max(} +---+4—d+1,0) 
1 
Hence the Fréchet lower bound cannot be a copula for d > 2. 


Some additional insight into the impossibility of countermonotonicity for dimen- 
sions higher than two is given by the following simple example. 


Example 7.25. Let X be a positive-valued rv and take X2 = 1/X, and X3 = e7*!. 
Clearly, (X1, X2) and (X1, X3) are countermonotonic random vectors. However, 
(X2, X3) is comonotonic and the copula of the vector (X1, X2, X3) is the df of the 
vector (U, 1 — U, 1 — U), which may be calculated to be 


C(uq, u2, u3) = max(min(u2, u3) + u1 — 1,0). 


7.2.2 Linear Correlation 


Correlation plays a central role in financial theory, but it is important to realize that 
the concept is only really a natural one in the context of multivariate normal or, 
more generally, elliptical models. As we have seen, elliptical distributions are fully 
described by a mean vector, a covariance matrix and a characteristic generator func- 
tion. Since means and variances are features of marginal distributions, the copulas of 
elliptical distributions can be thought of as depending only on the correlation matrix 
and characteristic generator; the correlation matrix thus has a natural parametric 
role in these models, which it does not have in more general multivariate models. 
Our discussion of correlation will focus on the shortcomings of correlation and the 
subtle pitfalls that the naive user of correlation may encounter when moving away 
from elliptical models. The concept of copulas will help us to illustrate these pitfalls. 

The correlation o(X 1, X2) between rvs X; and X2 was defined in (6.3). It is a 
measure of linear dependence and takes values in [—1, 1]. If X; and X2 are indepen- 
dent, then o(X1, X2) = 0, but it is important to be clear that the converse is false: 


7.2. Dependence Concepts and Measures 239 


the uncorrelatedness of X; and X2 does not in general imply their independence. 
Examples are provided by the class of uncorrelated normal mixture distributions (see 
Lemma 6.5) and the class of spherical distributions (with the single exception of the 
multivariate normal). For an even simpler example, we can take X; = Z ~ N(O, 1) 
and X = Z?; these are clearly dependent rvs but have zero correlation. 

If |o(X1, X2)| = 1, then this is equivalent to saying that X2 and X, are perfectly 
linearly dependent, meaning that X2 = a + 6X, almost surely for some a € R and 
B # 0, with £ > 0 for positive linear dependence and $ < O for negative linear 
dependence. Moreover, for 61, 62 > 0, 


play + 61X1, a2 + 2X2) = p(X, X2), 


so correlation is invariant under strictly increasing linear transformations. How- 
ever, correlation is not invariant under nonlinear strictly increasing transformations 
T:R — R. For two real-valued rvs we have, in general, o(T (X1), T(X2)) Æ 
P(X1, X2). 

Another obvious, but important, remark is that correlation is only defined when 
the variances of X; and X2 are finite. This restriction to finite-variance models is not 
ideal for a dependence measure and can cause problems when we work with heavy- 
tailed distributions. For example, actuaries who model losses in different business 
lines with infinite-variance distributions may not describe the dependence of their 
risks using correlation. We will encounter similar examples in Section 13.1.4 on 
operational risk. 


Correlation fallacies. We now discuss further pitfalls in the use of correlation, 
which we present in the form of fallacies. We believe these fallacies are worth high- 
lighting because they illustrate the dangers of attempting to construct multivariate 
risk models starting from marginal distributions and estimates of the correlations 
between risks. The statements we make are true if we restrict our attention to ellip- 
tically distributed risk factors, but they are false in general. For background to these 
fallacies, alternative examples and a discussion of the relevance to multivariate 
Monte Carlo simulation, see Embrechts, McNeil and Straumann (2002). 


Fallacy 1. The marginal distributions and pairwise correlations of a random vector 
determine its joint distribution. 


It should already be clear to readers of this chapter that this is not true. Figure 7.4 
shows the key to constructing counterexamples. Suppose the rvs X; and X2 have 
continuous marginal distributions Fy and F> and joint df C(Fi (x1), Fo(x2)) for 
some copula C, and suppose their linear correlation is o(X 1, X2) = p. It will 
generally be possible to find an alternative copula C2 # C and to construct a 
random vector (Y1, Y2) with df C2(F (x1), F2(x2)) such that p(Y1, Yo) = p. The 
following example illustrates this idea in a case where p = 0. 


Example 7.26. Consider two rvs representing profits and losses on two portfolios. 
Suppose we are given the information that both risks have standard normal distri- 
butions and that their correlation is 0. We construct two random vectors that are 
consistent with this information. 
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Figure 7.7. VaR for the risks X1 + X2 and Yı + Y? as described in Example 7.26. Both these 


pairs have standard normal margins and a correlation of zero; X; and X3 are independent, 
whereas Yı and Y> are dependent. 


0.92 
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Model 1 is the standard bivariate normal X ~ N2(0, J2). Model 2 is constructed by 
taking V to be an independent discrete rv such that P(V = 1) = P(V = —1) = 0.5 
and setting (Y1, Y2) = (X1, VX1) with X; as in Model 1. This model obviously 
also has normal margins and correlation zero; its copula is given by 


C (u1, u2) = 0.5 max(u, + u2 — 1,0) + 0.5 min(u1, u2), 


which is a mixture of the two-dimensional comonotonicity and countermonotonicity 
copulas. This could be roughly interpreted as representing two equiprobable states 
of the world: in one state financial outcomes in the two portfolios are comonotonic 
and we are certain to make money in both or lose money in both; in the other state 
they are countermonotonic and we will make money in one and lose money in the 
other. 

We can calculate analytically the distribution of the total losses X; + X2 and 
Yı + Y2; the latter sum does not itself have a univariate normal distribution. For 
k > 0 we get that 


P(X, +X. > k = ÖRN), POY >k) = EGH, 


from which it follows that, for a > 0.75, 
F$ x, (œ) =V20@), Fyn (€) = 267!Q2a— 1). 


In Figure 7.7 we see that the quantile of Yı + Y2 dominates that of X; + X2 for 
probability levels above 93%. This example also illustrates that the VaR of a sum of 
risks is clearly not determined by marginal distributions and pairwise correlations. 
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The correlation of two risks does not only depend on their copula—if it did, 
then Proposition 7.7 would imply that correlation would be invariant under strictly 
increasing transformations, which is not the case. Correlation is also inextricably 
linked to the marginal distributions of the risks, and this imposes certain constraints 
on the values that correlation can take. This is the subject of the second fallacy. 


Fallacy 2. For given univariate distributions F; and F and any correlation value p 
in [—1, 1] it is always possible to construct a joint distribution F with margins F; 
and F> and correlation p. 


Again, this statement is true if F} and F are the margins of an elliptical distribu- 
tion, but is in general false. The so-called attainable correlations can form a strict 
subset of the interval [—1, 1], as is shown in the next theorem. In the proof of the 
theorem we require the formula of Hoffding, which is given in the next lemma. 


Lemma 7.27. If (X1, X2) has joint df F and marginal dfs F; and F>, then the 
covariance of X, and X2, when finite, is given by 
[0,0] (oe) 
cov Xa) = | f Fand- AGN AG) dud. 029 
—00 J—00 
Proof. Let (X1, X2) have df F and let (X1, X2) be an independent copy (i.e. a 
second pair with df F independent of (X1, X2)). We have 
2.cov(X1, X2) = E((X1 — X1)(X2 — X2)). 


We now use a useful identity that says that, for any a € Rand b € R, we can always 
write (a — b) = pees (bsx) — Lta<x}) dx, and we apply this to the random pairs 
(X, — X1) and (X2 — X2). We obtain 


2 cov(X 1, X2) 


(e9) œo 
~ e( f J zisa) hxik) — [xy <x}) dxı ax) 
SS fe Be 


[0.6] CO 
= 2 | f PO EEE EE E EE EETA 
t OO 


Theorem 7.28 (attainable correlations). Let (X1, X2) be a random vector with 
finite-variance marginal dfs F; and Fy and an unspecified joint df; assume also that 
var(Xı) > 0 and var(X2) > 0. The following statements hold. 


(1) The attainable correlations form a closed interval | Omin, Pmax] With Pmin < 
0 < pmax- 

(2) The minimum correlation p = Pmin 1s attained if and only if X and X2 are 
countermonotonic. The maximum correlation p = pmax 1S attained if and 
only if Xı and Xz are comonotonic. 

(3) Pmin = —1 ifand only if X; and — X2 are of the same type (see Section A. 1.1), 
and pmax = | if and only if X, and X7 are of the same type. 
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Proof. We begin with (2) and use the identity (7.25). We also recall the two- 
dimensional Fréchet bounds for a general df in (7.6): 


max(F} (x1) + F2(x2) — 1,0) < F (x1, x2) < min(F) (x1), Fo(x2)). 


Clearly, when F; and F> are fixed, the integrand in (7.25) is maximized pointwise 
when X; and X2 have the Fréchet upper bound copula C (u1, u2) = min(u1, u2), 
i.e. when they are comonotonic. Similarly, the integrand is minimized when X; and 
X2 are countermonotonic. 

To complete the proof of (1), note that clearly Pmax > 0. However, Pmax = 0 can 
be ruled out since this would imply that min(F (x1), Fo(x2)) = Fi (%1) Fo(x2) for 
all x1, x2. This can only occur if F or F is a degenerate distribution consisting of 
point mass at a single point, but this is excluded by the assumption that variances 
are non-zero. By a similar argument we have that pmin < 0. If W (Fi, F2) and 
M (F, F2) denote the Fréchet lower and upper bounds, respectively, then the mixture 
AW (Fi, Fo) + -—A)M (FL, Fo), 0 < A < 1, has correlation A~min + (1 — A) pmax- 
Thus for any p € [Pmin, Pmax] we can set A = (Pmax — P)/(Pmax — Pmin) to construct 
a joint df that attains the correlation value p. 

Part (3) is clear since Pmin = —1 Or pmax = 1 if and only if there is an almost 
sure linear relationship between X; and X2. 


Example 7.29 (attainable correlations for lognormal rvs). An example where the 
maximal and minimal correlations can be easily calculated occurs when In X; ~ 
N(0, 1) and In X2 ~ N(0, o°). For o Æ 1 the lognormally distributed rvs X; and 
X2 are not of the same type (although In X; and In X2 are) so that, by part (3) of 
Theorem 7.28, we have Pmax < 1. The rvs X; and —X2 are also not of the same 
type, SO Pmin > —1. 

To calculate the actual boundaries of the attainable interval let Z ~ N (0, 1) and 
observe that if X; and X2 are comonotonic, then (X1, X2) = (eZ i e72), Clearly, 
Pmax = p(eZ „eZ ) and, by a similar argument, pmin = p(eZ TaZ ). The analytical 
calculation now follows easily and yields 


e7 — 1 e7 — 1 


Pmin = 5 >, Pmax = 5 . 
y (e — Der — 1) y (e — Der — 1) 


See Figure 7.8 for an illustration of the attainable correlation interval for different 
values of o and note how the boundaries of the interval both tend rapidly to zero as 
o is increased. This shows, for example, that we can have situations where comono- 
tonic rvs have very small correlation values. Since comonotonicity is the strongest 
form of positive dependence, this provides a correction to the widely held view that 
small correlations always imply weak dependence. 


Fallacy 3. For rvs X; ~ F; and X2 ~ F and for given a, the quantile of the sum 
F Xi +X (œ) is maximized when the joint distribution F has maximal correlation. 


While once again this is true if (X1, X2) are jointly elliptical, the statement is not 
true in general and any example of the superadditivity of the quantile function (or 
VaR risk measure) yields a counterexample. 
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Figure 7.8. Maximum and minimum attainable correlations for lognormal rvs X; and X2, 
where In X4 is standard normal and In X% is normal with mean 0 and variance o2. 


In a superadditive VaR example, we have, for some value of œ, that 
Fý x, (œ) > Fý @) + Fý, @), (1.26) 


but, by Proposition 7.20, the right-hand side of this inequality is equal to F Pi +y, (@) 
for a pair of comonotonic rvs (Y1, Y2) with Yı 4 Xı and Y) = X2. Moreover, by 
part (2) of Theorem 7.28, (Y1, Y2) will attain the maximal correlation Pmax and 
Pmax > P(X1, X2). A simple example where this occurs is as follows. 


Example 7.30. Let X; ~ Exp(1) and X2 ~ Exp(1) be two independent standard 
exponential rvs. Let Yj = Y) = X, and take a = 0.7. Since Xı + X2 ~ Ga(2, 1) 
(see Appendix A.2.4) it is easily checked that 


Fý 4x,(@) > Fý (a) + F% (@) = Fy yy, (@) 


but p(X, X2) = 0 and po(Y1, Y2) = 1. This example is also discussed in Sec- 
tion 8.3.3. 


In Section 8.4.4 we will look at the problem of discovering how “bad” the quantile 
of the sum of the two risks in (7.26) can be when the marginal distributions are 
known. 

A common message can be extracted from the fallacies of this section: namely 
that the concept of correlation is meaningless unless applied in the context of a 
well-defined joint model. Any interpretation of correlation values in the absence of 
such a model should be avoided. 


7.2.3 Rank Correlation 


Rank correlations are simple scalar measures of dependence that depend only on 
the copula of a bivariate distribution and not on the marginal distributions, unlike 
linear correlation, which depends on both. The standard empirical estimators of rank 
correlation may be calculated by looking at the ranks of the data alone, hence the 
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name. In other words, we only need to know the ordering of the sample for each 
variable of interest and not the actual numerical values. 

The main practical reason for looking at rank correlations is that they can be 
used to calibrate copulas to empirical data. At a theoretical level, being direct func- 
tionals of the copula, rank correlations have more appealing properties than linear 
correlations, as is discussed below. There are two main varieties of rank correlation, 
Kendall’s and Spearman’s, and both can be understood as a measure of concor- 
dance for bivariate random vectors. Two points in R?, denoted by (x1, x2) and 
(X1, X2), are said to be concordant if (x; — X1)(x2 — X2) > 0 and to be discordant 
if (xı = X1) (x2 = X2) <0. 


Kendall’s tau. Consider a random vector (X1, X2) and an independent copy 
(X i X2) (i.e. a second vector with the same distribution, but independent of the 
first). If X2 tends to increase with X1, then we expect the probability of concor- 
dance to be high relative to the probability of discordance; if X2 tends to decrease 
with increasing X4, then we expect the opposite. This motivates Kendall’s rank cor- 
relation, which is simply the probability of concordance minus the probability of 
discordance for these pairs: 


pr(X1, X2) = P((X1—Ž1)(X2- X2) > 0)— P((X1—X1)(X2—X2) < 0). (7.27) 


It is easily seen that there is a more compact way of writing this as an expectation, 
which also leads to an obvious estimator in Section 7.5.1. 


Definition 7.31. For rvs X; and X2 Kendall’s tau is given by 
pr(X1, X2) = E (siga (X1 — X1)(X2 — X2))), 
where (X1, X2) is an independent copy of (X1, X2) and sign(x) = Ix>0} — Ix <0}- 


In higher dimensions the Kendall’s tau matrix of arandom vector X may be written 
as 0r(X) = cov(Y), where Y = sign(X — X) and X is an independent copy of X; 
note that Y is obtained by the componentwise application of the sign function, so 
that Y = (Y1, ..., Yq)’, where Y; = sign(X; — Xi) fori = 1,...,d. Since p(X) 
can be expressed as the covariance matrix of Y, it is obviously positive semidefinite. 

We now show that, for random variables with continuous dfs, Kendall’s tau 
depends only on the unique copula C of (X1, X2) and we give an explicit formula 
for computing pz from C. 


Proposition 7.32. Suppose X; and X have continuous marginal distributions and 
unique copula C. Then 


1 1 
axd =4 f j! C (u1, u2)dC (u1, u2) — 1. (7.28) 


Proof. Starting from (7.27) and writing F, and F for the marginals dfs, we have 
pr(X1, X2) = 2P((X1 — X1)(X2 — X2) > 0) - 1 
= 4P(X < X1,X2 < X2)-1 (7.29) 
= 4P(F\(X1) < Fi(X1), P(X?) < F2(X2)) — 1, 
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where the second equality follows from the interchangeability of the pairs (X1, X2) 
and (X 1, X 2) and the third equality from the continuity of Fı and F2. Introducing 
the uniform random variables U; = F;(X;) and U; = F; (X;), i = 1, 2, and noting 
that the df of the pairs (U1, U2) and (Ŭi, U>) is C, we obtain 


Pr (X1, X2) = 4E(P (U; < Ŭi, U2 < U2 | Ŭi, Ŭ2)) — 1 


1 1 
=4 f f P(U, < u1, U2 < u2)dC (u1, u2) — 1, 
0 0 


from which (7.28) follows. 


Spearman’s rho. This measure can also be defined in terms of the concordance 
and discordance of two bivariate random vectors, but this time we consider the inde- 
pendent pairs (X1, X2) and (X l, X 2) and assume that they have identical marginal 
distributions but that the second pair is a pair of independent random variables. 


Definition 7.33. For rvs X; and X2 Spearman’s rho is given by 


ps(X1, X2) = 3(P (X1 — X1)(X2 — X2) > 0) — P((X1 — X1)(X2 — X2) < 0)), 

(7.30) 
where X 1 and X 2 are random variables satisfying X r= d y 1 and X 2= dy 2 and where 
(X1, X2), X 1 and Xo are all independent. 


It is not immediately apparent that this definition gives a sensible correlation 
measure, i.e. a number in the interval [—1, 1]. The following proposition gives an 
alternative representation, which makes this clear for continuous random variables. 


Proposition 7.34. If X, and X2 have continuous marginal distributions F; and F>, 
then ps(X1, X2) = p(F\(X1), F2(X2)). 


Proof. If the random vectors (X1, X2) and (X L X 2) have continuous marginal dis- 
tributions, we may write 


ps(X1, X2) = 6P (X1 — X1)(X2 — X2) > 0) -3 (7.31) 
= 6P ((F1 (X1) — Fi (X1))(Fo(X2) — Fa(X2)) > 0) — 3 
= 6P((U; — Uj)(U2 — U2) > 0) — 3, 
where (U1, U2) := (F1(X1), F2(X2)), Uy := F\(X1) and U2 = Fy(X2) and Uj, 


Ud, U 1 and Uz all have standard uniform distributions. Conditioning on U; and U2 
we obtain 


ps(X1, X2) = 6E(P((U — Ŭi) (U2 — Uz) > 0 | U1, U2)) — 3 
= 6E(P(U; < U1, U2 < U2 | U1, U2) 
+ P(U, > U1, Ur > Uz | U1, U2)) — 3 
= 6£(U,U2 + (1 — U1) — U2)) — 3 
= 12E(U, U2) — 6E (U1) — 6E(U2) + 3 
= 12 cov(U1, U2), (7.32) 
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where we have used the fact that E(U;) = E(U2) = 5. The conclusion that 
ps(X1, X2) = p(Fı (X1), Fo(X2)) follows from noting that var(U;) = var(U2) = 
1 


T: 


In other words, Spearman’s rho, for continuous random variables, is simply 
the linear correlation of their unique copula. The Spearman’s rho matrix for 
the general multivariate random vector X with continuous margins is given by 
ps(X) = p(Fı (X1), ..., Fa(Xa)) and must be positive semidefinite. In the bivari- 
ate case, the formula for Spearman’s rho in terms of the copula C of (X1, X2) follows 
from a simple application of Hoffding’s formula (7.25) to formula (7.32). 


Corollary 7.35. Suppose X, and X% have continuous marginal distributions and 
unique copula C. Then 


1 pl 
ps(X1, X2) = 12 f f (C (u1, u2) — u1u2) du; du2. (7.33) 
o Jo 


Properties of rank correlation. Kendall’s tau and Spearman’s rho have many prop- 
erties in common. They are both symmetric dependence measures taking values in 
the interval [—1, 1]. They give the value zero for independent rvs, although a rank 
correlation of 0 does not necessarily imply independence. It can be shown that they 
take the value 1 if and only if X; and X2 are comonotonic (see Embrechts, McNeil 
and Straumann 2002) and the value —1 if and only if they are countermonotonic 
(which contrasts with the behaviour of linear correlation observed in Theorem 7.28). 
They are invariant under strictly increasing transformations of X; and X2. 

To what extent do the fallacies of linear correlation identified in Section 7.2.2 carry 
over to rank correlation? Clearly, Fallacy 1 remains relevant: marginal distributions 
and pairwise rank correlations do not fully determine the joint distribution of a vector 
of risks. Fallacy 3 also still applies when we switch from linear to rank correlations; 
although two comonotonic risks will have the maximum rank correlation value of 
one, this does not imply that the quantile of their sum is maximized over the class 
of all joint models with the same marginal distributions. 

However, Fallacy 2 no longer applies when we consider rank correlations: for 
any choice of continuous marginal distributions it is possible to specify a bivariate 
distribution that has any desired rank correlation value in [—1, 1]. One way of doing 
this is to take a convex combination of the form 


F (x1, x2) = AW (Fi (x1), Fo(x2)) + (A — AM (Fi (x1), F2(x2)), 


where W and M are the countermonotonicity and comonotonicity copulas, respec- 
tively. A random pair (X1, X2) with this df has rank correlation 


P:(X1, X2) = ps(X1, X2) = 1 — 2d, 


which yields any desired value in [—1, 1] for an appropriate choice of A in [0, 1]. But 
this is only one of many possible constructions; a model with the Gauss copula of the 
form F (x1, x2) = C Ga (F 1(x1), F2 (x2)) can also be parametrized by an appropriate 
choice of p € [—1, 1] to have any rank correlation in [—1, 1]. In Section 7.3.2 we 
will calculate Spearman’s rho and Kendall’s tau values for the Gauss copula and 
other copulas of normal variance mixture distributions. 
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7.2.4 Coefficients of Tail Dependence 


Like the rank correlations, the coefficients of tail dependence are measures of pair- 
wise dependence that depend only on the copula of a pair of rvs X; and X2 with 
continuous marginal dfs. The motivation for looking at these coefficients is that they 
provide measures of extremal dependence or, in other words, measures of the strength 
of dependence in the tails of a bivariate distribution. The coefficients we describe 
are defined in terms of limiting conditional probabilities of quantile exceedances. 
We note that there are a number of other definitions of tail-dependence measures in 
the literature (see Notes and Comments). 

In the case of upper tail dependence we look at the probability that X2 exceeds 
its g-quantile, given that Xı exceeds its g-quantile, and then consider the limit as 
q goes to one. Obviously the roles of X; and X2 are interchangeable. Formally we 
have the following. 


Definition 7.36. Let X; and X2 be rvs with dfs F, and F2. The coefficient of upper 
tail dependence of X; and X2 is 


Au := Au(X1, X2) = ca P(X2 > Fy (q) | X1 > FÉ @)), 
qa 


provided a limit Ay € [0, 1] exists. If Au € (0, 1], then X; and X> are said to show 
upper tail dependence or extremal dependence in the upper tail; if Ay = 0, they are 
asymptotically independent in the upper tail. Analogously, the coefficient of lower 
tail dependence is 


Ay = 4 (X1, X2) = lim P(X2 < Fy (q) | X1 < FÉ @)), 
q> 


provided a limit A; € [0, 1] exists. 


If Fı and F are continuous dfs, then we get simple expressions for 4; and Ay 
in terms of the unique copula C of the bivariate distribution. Using elementary 
conditional probability and (7.4) we have 


P(X2 < Fy (q), X1 < FÉ (@)) 


4, = lim 
q70t P(X: S FÉ) 
C ’ 
a ie NON: (7.34) 
q>0r q 
For upper tail dependence we use (7.14) to obtain 
C(—q,1- Ca, 
gaia Ee a a, (7.35) 
q4>17 1- q q—>0* q 


where Ĉ is the survival copula of C (see (7.16)). For radially symmetric copulas we 
must have A; = Ay, since C = C for such copulas. 

Calculation of these coefficients is straightforward if the copula in question has a 
simple closed form, as is the case for the Gumbel copula in (7.12) and the Clayton 
copula in (7.13). In Section 7.3.1 we will use a slightly more involved method 
to calculate tail-dependence coefficients for copulas of normal variance mixture 
distributions, such as the Gaussian and ¢ copulas. 
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Example 7.37 (Gumbel and Clayton copulas). Writing Coe for the Gumbel sur- 
vival copula we first use (7.16) to infer that 


COC gi CS(g,g)—1 
sect Ce a at ae, ea EE 
q>17 l-q q>17 q-1 


. fo, 
We now use L’H6pital’s rule and the fact that eo (u, u) =u? ‘ to infer that 


dcSu i 
eS i ee 
q>17 dq 


= 2 — 218 


Provided that 0 > 1, the Gumbel copula has upper tail dependence. The strength of 
this tail dependence tends to 1 as 9 —> oo, which is to be expected since the Gumbel 
copula tends to the comonotonicity copula as 0 —> oo. Using a similar technique 
the coefficient of lower tail dependence for the Clayton copula may be shown to be 
à = 271? ford > 0. 


The consequences of the lower tail dependence of the Clayton copula and the 
upper tail dependence of the Gumbel copula can be seen in Figures 7.3 and 7.4, 
where there is obviously an increased tendency for these copulas to generate joint 
extreme values in the respective corners. In Section 7.3.1 we will see that the Gauss 
copula is asymptotically independent in both tails, while the ¢ copula has both upper 
and lower tail dependence of the same magnitude (due to its radial symmetry). 


Notes and Comments 


The concept of comonotonicity or perfect positive dependence is discussed by many 
authors, including Schmeidler (1986) and Yaari (1987). See also Wang and Dhaene 
(1998), whose proof we use in Proposition 7.18, and the entry in the Encyclopedia 
of Actuarial Science by Vyncke (2004). 

The discussion of correlation fallacies is based on Embrechts, McNeil and Strau- 
mann (2002), which contains a number of other examples illustrating these pitfalls. 
Throughout this book we make numerous references to this paper, which also played 
a major role in popularizing the copula concept mainly, but not solely, in finance, 
insurance and economics (see, for example, Genest, Gendron and Bourdeau-Brien 
2009). The ETH-RiskLab preprint of this paper was available as early as 1998, 
with a published abridged version appearing as Embrechts, McNeil and Straumann 
(1999). 

For Höffding’s formula and its use in proving the bounds on attainable correla- 
tions see Höffding (1940), Fréchet (1957) and Shea (1983). Useful references for 
rank correlations are Kruskal (1958) and Joag-Dev (1984). The relationship between 
rank correlation and copulas is discussed in Schweizer and Wolff (1981) and Nelsen 
(2006). The definition of tail dependence that we use stems from Joe (1993, 1997). 
There are a number of alternative definitions of tail-dependence measures, as dis- 
cussed, for example, in Coles, Heffernan and Tawn (1999). 

Important books that treat dependence concepts and emphasize links to copulas 
include Joe (1997), Denuit et al. (2005) and Rüschendorf (2013). 
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7.3 Normal Mixture Copulas 


A unique copula is contained in every multivariate distribution with continuous 
marginal distributions, and a useful class of parametric copulas are those contained 
in the multivariate normal mixture distributions of Section 6.2. We view these cop- 
ulas as particularly important in market-risk applications; indeed, in most cases, 
these copulas are used implicitly, without the user necessarily recognizing the fact. 
Whenever normal mixture distributions are fitted to multivariate return data or used 
as innovation distributions in multivariate time-series models, normal mixture cop- 
ulas are used. They are also found in a number of credit risk models, as we discuss 
in Section 12.2. 

In this section we first focus on normal variance mixture copulas; in Section 7.3.1 
we examine their tail-dependence properties; and in Section 7.3.2 we calculate rank 
correlation coefficients, which are useful for calibrating these copulas to data. Then, 
in Sections 7.3.3 and 7.3.4, we look at more exotic examples of copulas arising from 
multivariate normal mixture constructions. 


7.3.1 Tail Dependence 


Coefficients of tail dependence. Consider a pair of uniform rvs (U1, U2) whose 
distribution C (u1, u2) is a normal variance mixture copula. Due to the radial sym- 
metry of C (see Section 7.1.5), it suffices to consider the formula for the lower 
tail-dependence coefficient in (7.34) to calculate the coefficient of tail dependence 
à of C. By applying L’ H6pital’s rule and using (7.17) we obtain 


_ dC(q,q) 
A= lim m= lim P(U2 <q|U; = q)+ lim P(U; <q|U2=4). 
q—>0t dq q—>0+ q—>0+t 


Since C is exchangeable, we have from (7.21) that 


A=2 lim P(U2 <q | U1 = q). (1.36) 
q—>0* 


We now show the interesting contrast between the Gaussian and ¢ copulas that we 
alluded to in Example 7.13, namely that the t copula has tail dependence whereas 
the Gauss copula is asymptotically independent in the tail. 


Example 7.38 (asymptotic independence of the Gauss copula). To evalu- 
ate the tail-dependence coefficient for the Gauss copula eC, let (X1, X2) := 
($7! (U1), T! (U2)), so that (X1, X2) has a bivariate normal distribution with 
standard margins and correlation p. It follows from (7.36) that 
à=2 lim P(®7! (U2) < $~! (q) | D7 U) = 87) 
q>0 


=2 lim P(X <x | Xı = x). 
x7 -CO 


Using the fact that X2 | Xı = x ~ N(px, 1 — p°), it can be calculated that 


A=2 lim ®(x,/1— p//1+4 p) =0, 
x7 -CO 
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Table 7.1. Values of i, the coefficient of upper and lower tail dependence, for the t copula 
C A p for various values of v, the degrees of freedom, and p, the correlation. The last row 
represents the Gauss copula. 


2 0.06 0.18 0.39 0.72 

4 001 0.08 0.25 0.63 
10 0.00 0.01 0.08 0.46 
oo 0 0 0 0 


provided p < 1. Hence, the Gauss copula is asymptotically independent in both 
tails. Regardless of how high a correlation we choose, if we go far enough into the 
tail, extreme events appear to occur independently in each margin. 


Example 7.39 (asymptotic dependence of the ¢ copula). To evaluate the tail- 
dependence coefficient for the t copula Ci a let (X1, X2) := (t7! (U1), t71 (U2)), 
where t, denotes the df of a univariate t distribution with v degrees of freedom. Thus 
(X1, X2) ~ thv, 0, P), where P is a correlation matrix with off-diagonal element p. 
By calculating the conditional density from the joint and marginal densities of a 


bivariate ¢ distribution, it may be verified that, conditional on X; = x, 


1 1/2 %, _ 
(= ) aoe (7.37) 


~t ; 
2 v+1 
v+x /1— p? 


Using an argument similar to Example 7.38 we find that 


pz 2na ( e i 2), (7.38) 


Provided that o > —1, the copula of the bivariate ¢ distribution is asymptotically 
dependent in both the upper and lower tails. 

In Table 7.1 we tabulate the coefficient of tail dependence for various values of 
v and p. For fixed p the strength of the tail dependence increases as v decreases, 
and for fixed v tail dependence increases as p increases. Even for zero or negative 
correlation values there is some tail dependence. This is not too surprising and can 
be grasped intuitively by recalling from Section 6.2.1 that the f distribution is a 
normal mixture distribution with a mixing variable W whose distribution is inverse 
gamma (which is a heavy-tailed distribution): if |X | is large, there is a good chance 
that this is because W is large, increasing the probability of |X| being large. 


We could use the same method used in the previous examples to calculate tail- 
dependence coefficients for other copulas of normal variance mixtures. In doing so 
we would find that most examples, such as copulas of symmetric hyperbolic or NIG 
distributions, fell into the same category as the Gauss copula and were asymptotically 
independent in the tails. The essential determinant of whether the copula of a normal 
variance mixture has tail dependence or not is the tail of the distribution of the mixing 
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variable W in Definition 6.4. If W has a distribution with a power tail, then we get 
tail dependence, otherwise we get asymptotic independence. This is a consequence 
of a general result for elliptical distributions given in Section 16.1.3. 


Joint quantile exceedance probabilities. Coefficients of tail dependence are of 
course asymptotic quantities, and in the remainder of this section we look at joint 
exceedances of finite high quantiles for the Gauss and t copulas in order to learn more 
about the practical consequences of the differences between the extremal behaviours 
of these two models. 

As motivation we consider Figure 7.9, where 5000 simulated points from four dif- 
ferent distributions are displayed. The distributions in (a) and (b) are meta-Gaussian 
distributions (see Section 7.1.3); they share the same copula cs The distributions 
in (c) and (d) are meta-¢ distributions; they share the same copula C A p The values 
of v and p in all parts are 4 and 0.5, respectively. The distributions in (a) and (c) 
share the same margins, namely standard normal margins. The distributions in (b) 
and (d) both have Student £ margins with four degrees of freedom. The distribu- 
tions in (a) and (d) are, of course, elliptical, being a standard bivariate normal and a 
bivariate ¢ distribution with four degrees of freedom; they both have linear correla- 
tion p = 0.5. The other distributions are not elliptical and do not necessarily have 
linear correlation 50%, since altering the margins alters the linear correlation. All 
four distributions have identical Kendall’s tau values (see Proposition 7.43). The 
meta-Gaussian distributions have the same Spearman’s rho value, as do the meta-t 
distributions, although the two values are not identical (see Section 7.2.3). 

The vertical and horizontal lines mark the true theoretical 0.005 and 0.995 quan- 
tiles for all distributions. Note that for the meta-t distributions the number of points 
that lie below both 0.005 quantiles or exceed both 0.995 quantiles is clearly greater 
than for the meta-Gaussian distributions, and this can be explained by the tail depend- 
ence of the t copula. The true theoretical ratio by which the number of these joint 
exceedances in the meta-t models should exceed the number in the meta-Gaussian 
models is 2.79, as may be read from Table 7.2, whose interpretation we now discuss. 

In Table 7.2 we have calculated values of CHG, u)/Cy p, u) for various p 
and v and u = 0.05, 0.01, 0.005, 0.001. The rows marked Gauss contain values of 
ce (u, u), which is the probability that two rvs with this copula are below their 
u-quantiles; we term this event a joint quantile exceedance (thinking of exceedance 
in the downwards direction). It is obviously identical to the probability that both rvs 
are larger than their (1 — u)-quantiles. The remaining rows give the values of the ratio 
and thus express the amount by which the joint quantile exceedance probabilities 
must be inflated when we move from models with a Gauss copula to models with a 
t copula. 

In Table 7.3 we extend Table 7.2 to higher dimensions. We now focus only on 
joint exceedances of the 1% (or 99%) quantile(s). We tabulate values of the ratio 
Ce (u,..., u)/C) pu, ...,U), Where P is an equicorrelation matrix with all cor- 
relations equal to p. It is noticeable that not only do these values grow as the corre- 
lation parameter or number of degrees of freedom falls, but they also grow with the 
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Figure 7.9. Five thousand simulated points from four distributions. (a) Standard bivariate 
normal with correlation parameter p = 0.5. (b) Meta-Gaussian distribution with copula C pa 
and Student t margins with four degrees of freedom. (c) Meta-t distribution with copula 
Ch. f and standard normal margins. (d) Standard bivariate ¢ distribution with four degrees of 
freedom and correlation parameter p = 0.5. Horizontal and vertical lines mark the 0.005 and 
0.995 quantiles. See Section 7.3.1 for a commentary. 


Table 7.2. Joint quantile exceedance probabilities for bivariate Gauss and ft copulas with 
correlation parameter values of 0.5 and 0.7. For Gauss copulas the probability of joint quantile 
exceedance is given; for the t copulas the factors by which the Gaussian probability must be 
multiplied are given. 


Quantile 

cr rooOOwOD"'"'——vm0vwODm'@*”"0—,_,_OOe*=*_®@ EE 
o  Copula v 0.05 0.01 0.005 0.001 
0.5 Gauss 1.21 x 107? 1.29x 1073 4.96 x 1074 5.42 x 1075 
0.5 t 8 1.20 1.65 1.94 3.01 
0.5 t 4 1.39 2.22 2.79 4.86 
0.5 t 3 1.50 2.55 3.26 5.83 
0.7 Gauss 1.95 x 1072? 2.67x 107? 1.14x107? 1.60 x 1074 
0.7 t 8 1.11 1.33 1.46 1.86 
0.7 t 4 1.21 1.60 1.82 2.52 


0.7 t 3 1.27 1.74 2.01 2.83 
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Table 7.3. Joint 1% quantile exceedance probabilities for multivariate Gaussian and t 
equicorrelation copulas with correlation parameter values of 0.5 and 0.7. For Gauss cop- 
ulas the probability of joint quantile exceedance is given; for the t copulas the factors by 
which the Gaussian probability must be multiplied are given. 


Dimension d 
—_—_—_— 


p  Copua v 2 3 4 5 

0.5 Gauss 1.29x 1073 3.66x 1074 149x 1074 7.48 x 1075 
0.5 t 8 1.65 2.36 3.09 3.82 
0.5 t 4 3:99 3.82 5.66 7.68 
0.5 t 3 2.55 4.72 7.35 10.34 
0.7 Gauss 2.67x 1073 1.28 1073 7.77 x 1074 5.35 x 1074 
0.7 t 8 1.33 1.58 1.78 1.95 
0.7 t 4 1.60 2.10 2.53 2.91 
0.7 t 3 1.74 2.39 2.97 3.45 


dimension of the copula. The next example gives an interpretation of one of these 
numbers. 


Example 7.40 (joint quantile exceedances: an interpretation). Consider daily 
returns on five stocks. Suppose we are unsure about the best multivariate elliptical 
model for these returns, but we believe that the correlation between any two returns 
on the same day is 50%. If returns follow a multivariate Gaussian distribution, then 
the probability that on any day all returns are below the 1% quantiles of their respec- 
tive distributions is 7.48 x 1075. In the long run, such an event will happen once 
every 13 369 trading days on average: that is, roughly once every 51.4 years (assum- 
ing 260 trading days in a year). On the other hand, if returns follow a multivariate 
t distribution with four degrees of freedom, then such an event will happen 7.68 
times more often: that is, roughly once every 6.7 years. In the life of a risk manager, 
50-year events and 7-year events have a very different significance. 


7.3.2 Rank Correlations 

To calculate rank correlations for normal variance mixture copulas we use the fol- 
lowing preliminary result for elliptical distributions. 

Proposition 7.41. Let X ~ E2(0, X, Y) and p = £()12, where go denotes the 
correlation operator in (6.5). Assume P(X = 0) = 0. Then 


arcsin p 
27 


Proof. First we make a standardization of the variables and observe that if Y ~ 
E2(0, P, Y) and P = (X), then P(X; > 0, X2 > 0) = P(Y, > 0, Y2 > 0). 
Now introduce a pair of spherical variates Z ~ S7(y); it follows that 


d 
(Yı, Y2) = (Z1, pZ1 + V1 — p?Zp) 
= R(cos O, pcos © + y1 — p* sin O), 


P(X; >0,X2>0)=}4+ 
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where R is a positive radial rv and @ is an independent, uniformly distributed angle 
on [—7, 7 ) (see Section 6.3.1 and Theorem 6.21). Let ø = arcsin p and observe that 
sing = p andcos¢ = y 1 — p?. Since P(R = 0) = P(X = 0) = 0 we conclude 
that 


P(X, > 0, X2 > 0) = P(cos@ > 0, sing@cos@ + cos ġ sin O > 0) 

= P(cos@ > 0, sin(O + ) > 0). 
The angle © must jointly satisfy © € (4x, ir) and © + ¢@ € (0, 7), and it is 
easily seen that for any value of ¢ this has probability (Gx + ¢)/(27), which gives 
the result. 


Theorem 7.42 (rank correlations for the Gauss copula). Let X have a bivariate 
meta-Gaussian distribution with copula cH and continuous margins. Then the rank 
correlations are 


2 

Pr (X1, X2) = = arcsin p, (7.39) 
6 

ps(X1, X2) = F arcsin 4p. (7.40) 


Proof. Since rank correlation is a copula property, we can of course simply assume 
that X ~ N2(0, P), where P is a correlation matrix with off-diagonal element p; 
the calculations are then easy. For Kendall’s tau, formula (7.29) implies 


Pr(X1, X2) = 4P (Y1 > 0, Y2 > 0)- 1, 


where Y = X — X and X is an independent copy of X. Since, by the convolution 
property of the multivariate normal distribution in Section 6.1.3, Y ~ N2(0, 2P), 
we have that o(Y1, Y2) = p and formula (7.39) follows from Proposition 7.41. 
For Spearman’s rho, let Z = (Z1, Z2)’ be a vector consisting of two independent 
standard normal random variables and observe that formula (7.31) implies 
ps(X1, X2) = 3(2P((X1 — Z1)(X2 — Zz) > 0) — 1) 

= 3(4P(X; — Z| > 0, X2 — Zo > 0)- 1) 

= 3(4P(Y, > 0, Yo > 0) — 1), 
where Y = X — Z. Since Y ~ N2(0, (P + h)), the formula (7.40) follows from 
Proposition 7.41 and the fact that o (Y1, Y2) = o/2. 


These relationships between the rank correlations and p are illustrated in Fig- 
ure 7.10. Note that the right-hand side of (7.40) may be approximated by the value 
p itself. This approximation turns out to be very accurate, as shown in the figure; 
the error bounds are |6 arcsin(p/2)/m — p| < (x — 3)|p|/m < 0.0181. 

The relationship between Kendall’s tau and the correlation parameter of the Gauss 
copula ee expressed by (7.39) holds more generally for the copulas of all ellip- 
tical distributions that exclude point mass at their centre, including, for example, 
the t copula C a p: This is a consequence of the following result, which was already 
used to derive an alternative correlation estimator for bivariate distributions in Sec- 
tion 6.3.4. 
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Figure 7.10. The solid line shows the relationship between Spearman’s rho and the corre- 


lation parameter p of the Gauss copula gga for meta-Gaussian rvs with continuous dfs; this 


is very close to the line y = x, which is just visible as a dotted line. The dashed line shows 
the relationship between Kendall’s tau and p; this relationship holds for the copulas of other 
normal variance mixture distributions with correlation parameter p, such as the t copula C o 


Proposition 7.43. Let X ~ E2(0, P, Y) fora correlation matrix P with off-diagonal 
element p, and assume that P(X = 0) = 0. Then the relationship p: (X1, X2) = 
(2/x) arcsin p holds. 


Proof. The result relies on the convolution property of elliptical distributions 
in (6.47). Setting Y = X — X, where X is an independent copy of X, we note 
that Y ~ E2(0, P, Y) for some characteristic generator Y. We need to evaluate 
P:(X1, X2) = 4P (Yı > 0, Y2 > 0) — 1 as in the proof of Theorem 7.42, but 
Proposition 7.41 shows that P(Y;ı > 0, Y2 > 0) takes the same value whenever Y 
is elliptical. 


The relationship (7.40) between Spearman’s rho and the correlation parameter of 
the Gauss copula does not hold for the copulas of all elliptical distributions. We can, 
however, derive a formula for the copulas of normal variance mixture distributions 
based on the following result. 


Proposition 7.44. Let X ~ M2(0, P, H ) be distributed according to a normal 
variance mixture distribution for a correlation matrix P with off-diagonal element 
p, and assume that P(X = 0) = 0. Then 


ps(X1, X2) = É p ( arcsin (0 me J) (7.41) 
7 Jw +W)(W+W) 


where W, W and W are independent random variables with df H such that the 
Laplace-Stieltjes transform of H is H. 
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Proof. Assume that X = V WZ, Meee Z~ N2(0, P). Let Z and Z be standard 
YE ae and assume that Z, Z, Z, W, W and W are all independent. Write 


= VWz, = /WZ and 
Yi := X- Š SVE av WZ, 
P := Xp OX al WZ Wz. 
The result is proved by applying a similar approach to Theorem 7.42 and condi- 


ie: on the variables W, W and W. We note that the conditional distribution of 
= (Y1, Y2)’ satisfies 


rimaiman ("r we yy 
Wp W+W 


Using formula (7.31) we calculate 
ps(X1, X2) = 6P (X1 — X1)(X2 — X2) > 0) —3 
= 3(2E(P(Y1¥2 > 0| W, W, W)) — 1) 
= 3E(4P(Y, > 0, Y2 >0| W, W, W)— 1), 


and (7.41) is obtained by applying Proposition 7.41 and using the fact that, given 


W, W and W, 
W 
pi, Y2) = p 


Jw + Www 


The formula (7.41) reduces to the formula (7.40) in the case where W = W = 
W = k for some positive constant k. In general, Spearman’s rho for the copu- 
las of normal variance mixtures can be calculated accurately by approximating 
the integral in (7.41) using Monte Carlo. For example, to calculate Spearman’s 
rho for the ¢ copula Cs we would generate a set of inverse gamma variates 
{W;, Wj, Wj, hoe = 1, Sa for m large, such that each variable in the set had an 
independent Ig($v ; P distribution, We would then use 

m 
ps(X1, X2) & ES ye arcsin (0 Mi = | (7.42) 
gE (W; + WW; + Wj) 


7.3.3 Skewed Normal Mixture Copulas 


A skewed normal mixture copula is the copula of any normal mixture distribution 
that is not elliptically symmetric. An example is provided by the skewed t copula, 
which is the copula of the distribution whose density is given in (6.31). 

A random vector X with a skewed ¢ distribution and v degrees of freedom is 
denoted by X ~ GH4(—4v, v,0, u, X, y) in the notation of Section 6.2.3. Its 
marginal distributions satisfy X; ~ GHı(—4v, v, 0, Mi, Xii, yi) (from Proposi- 
tion 6.13) and its copula depends on v, P = (X) and y and will be denoted 


7.3. Normal Mixture Copulas 257 


by C Py 95 in the bivariate case, C pyi, Random sampling from the skewed 


t copula follows the general approach of Algorithm 7.10. 
Algorithm 7.45 (simulation of the skewed ¢ copula). 
(1) Generate X ~ GHa(—3», v, 0,0, P, y) using Algorithm 6.10. 


(2) Return U = (F(X), ..., Fg(Xq))’, where F; is the distribution function of 


a GH (—4», v, 0, 0, 1, yi) distribution. The random vector U has df Ce Py: 


Note that the evaluation of F; requires the numerical integration of the density of a 
skewed univariate ¢ distribution. 


To appreciate the flexibility of the skewed t copula it suffices to consider the 
bivariate case for different values of the skewness parameters yı and y2. In Fig- 
ure 7.11 we have plotted simulated points from nine different examples of this 
copula. Part (e) corresponds to the case when yı = y2 = O and is thus the ordi- 
nary t copula. All other pictures show copulas that are non-radially symmetric (see 
Section 7.1.5), as is obvious by rotating each picture 180° about the point G, 5); 
(c), (e) and (g) show exchangeable copulas satisfying (7.20), while the remaining 
six are non-exchangeable. 

Obviously, the main advantage of the skewed f copula over the ordinary t copula is 
that its asymmetry allows us to have different levels of tail dependence in “opposite 
corners” of the distribution. In the context of market risk it is often claimed that joint 
negative returns on stocks show more tail dependence than joint positive returns. 


7.3.4 Grouped Normal Mixture Copulas 


Technically speaking, a grouped normal mixture copula is not itself the copula of a 
normal mixture distribution, but rather a way of attaching together a set of normal 
mixture copulas. We will illustrate the idea by considering the grouped t copula. 
Here, the basic idea is to construct a copula for a random vector X such that certain 
subvectors of X have t copulas but quite different levels of tail dependence. 

We create a distribution using a generalization of the variance-mixing construc- 
tion X = /WZ in (6.18). Rather than multiplying all components of a correlated 
Gaussian vector Z with the root of a single inverse-gamma-distributed variate W, 
as in Example 6.7, we instead multiply different subgroups with different vari- 
ates W;, where W; ~ Ig(5v;, 5¥,) and the W; are themselves comonotonic (see 
Section 7.2.1). We therefore create subgroups whose dependence properties are 
described by ¢ copulas with different v; parameters. The groups may even consist 
of a single member for each v; parameter, an idea that has been developed by Luo 
and Shevchenko (2010) under the name of the generalized t copula. 

Like the ¢ copula, the skewed ¢ copula and anything based on a mixture of multi- 
variate normals, a grouped ¢ copula is easy to simulate and thus to use in Monte 
Carlo risk studies—this has been a major motivation for its development. We for- 
mally define the grouped ¢ copula by explaining in more detail how to generate a 
random vector U with that distribution. 
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Figure 7.11. Ten thousand simulated points from bivariate skewed t copula C A pyi y2 FOT 


v = 5, p = 0.8 and various values of the parameters y = (y1, y2)": (a) y = (0.8, —0.8)'; 
(b) y = (0.8, 0)’; (c) y = (0.8, 0.8)’; (d) y = (0, —0.8)’; (e) y = (0, 0’; (Ð y = (0, 0.8); 
(g) y = (—0.8, —0.8)’; (h) y = (—0.8, 0)’; and (i) y = (—0.8, 0.8)’. 


Algorithm 7.46 (simulation of the grouped ¢ copula). 


(1) Generate independently Z ~ Na (0, P) and U ~ U (0, 1). 


(2) Partition {1, ...,d} into m subsets of sizes s1, ..., Sm, and for k =1,...,m 
let vg be the degrees-of-freedom parameter associated with group k. 


(3) Set Wg = G3 (U), where G, is the df of the univariate Ig(5v, xv) distri- 
bution, so that W1, ..., Wm are comonotonic and inverse-gamma-distributed 
variates. 


(4) Construct vectors X and U by 
X=(¥ WZ,..., WZ;s,, Wa Zsi+1s -5 Wa Zsi4s2 -<o Y Wala) 
U= (ty, (X1), e.’ hy, (Xs,), tv (X5,41), e.’ by (Xsi+92); seka tvm (Xa))’. 


The former has a grouped ż distribution and the latter is distributed according 
to a grouped ¢ copula. 
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If we have an a priori idea of the desired group structure, we can calibrate the 
grouped t copula to data using a method based on Kendall’s tau rank correlations. 
The use of this method for the ordinary f¢ copula is described later in Section 7.5.1 
and Example 7.56. 


Notes and Comments 


The coefficient of tail dependence for the t copula was first derived in Embrechts, 
McNeil and Straumann (2002). A more general result for the copulas of elliptical 
distributions is given in Hult and Lindskog (2002) and will be discussed in Sec- 
tion 16.1.3. The formula for Kendall’s tau for elliptical distributions can be found 
in Lindskog, McNeil and Schmock (2003) and Fang and Fang (2002). 

Proposition 7.44 is due to Andrew D. Smith (personal correspondence), who has 
also derived the attractive alterative formula 


ps(X1, X2) = (6/7) E(aresin(p sin(@/2))), 


where © is the (random) angle in a triangle with side lengths (W! + Ww), 
(W-! + W7!) and (Wo! + W7!). 

The skewed t copula was introduced in Demarta and McNeil (2005), which also 
describes the grouped t copula. The grouped t copula and a method for its calibration 
were first proposed in Daul et al. (2003). The special case of the grouped ¢ copula 
with one member in each group has been investigated by Luo and Shevchenko (2010, 
2012), who refer to this as a generalized t copula. 


7.4 Archimedean Copulas 


The Gumbel copula (7.12) and the Clayton copula (7.13) belong to the family of so- 
called Archimedean copulas, which has been very extensively studied. This family 
has proved useful for modelling portfolio credit risk, as will be seen in Example 11.4. 
In this section we look at the simple structure of these copulas and establish some 
of the properties that we will need. 


7.4.1 Bivariate Archimedean Copulas 


As well as the Gumbel and Clayton copulas, two further examples we consider are 
the Frank copula, 


1 (e= — 1)(e- 2 — 1) 
F 2 
Co (u1, u2) = -7 In (: + =) ZI š 0 R, 
and a two-parameter copula that we refer to as a generalized Clayton copula, 
CHS Ui u2) = (UT? — D? +03 -DP +D, 630, 621. 


It may be verified that, provided the parameter 6 lies in the ranges we have specified 
in the copula definitions, all four examples that we have met have the form 


C(u, u2) = Ww!) + Ww), (7.43) 
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Table 7.4. Table summarizing the generators, permissible parameter values and limiting 
special cases for four selected Archimedean copulas. The case 0 = 0 should be taken to mean 
the limit limg_,9 Wo (t). For the Clayton and Frank copulas this limit is e~’, which is the 
generator of the independence copula. 


Parameter 
Copula Generator w(t) range y E€ Wœ Lower Upper 
Ci exp(—1!/®) 6>1 Yes I M 
ce max((1 +0718, 0) 6>-1 6>0 W M 
1 
Cy gin + (e7? — De~’) OER 0>0 w M 
Cry (1+ 621/8)-1/0 9>0,6>1 Yes N/A N/A 


where y: [0, co) — [0, 1] is a decreasing and continuous function that satisfies 
the conditions W (0) = 1 and lim; œ Y(t) = 0. The function y is known as the 
generator of the copula. For example, for the Gumbel copula y(t) = exp(—t!/°) 
for 0 > 1, and for the other copulas the generators are given in Table 7.4. 

When we introduced the Clayton copula in (7.13) we insisted that its parameter 
should be non-negative. In the table we also define a Clayton copula for —1 < 6 < 0. 
To accommodate this case, the generator is written w(t) = max((1 + 6t)—!/? 0). 
We observe that y (t) is strictly decreasing on [0, —1/0) but w(t) = Oon[—1/6, ow). 
To define the generator inverse uniquely at zero we set w—!(0) = inf{t: Y(t) = 
0} = -1/8. 

In Table 7.4 we also give the lower and upper limits of the families as the parameter 
0 goes to the boundaries of the parameter space. Both the Frank and Clayton copulas 
are known as comprehensive copulas, since they interpolate between a lower limit 
of countermonotonicity and an upper limit of comonotonicity. For a more extensive 
table of Archimedean copulas see Nelsen (2006). 

The following important theorem clarifies the conditions under which a function 
w is the generator of a bivariate Archimedean copula and allows us to define an 
Archimedean copula generator formally. 


Theorem 7.47 (bivariate Archimedean copula). Let w: [0,co) — [0,1] be 
a decreasing, continuous function that satisfies the conditions w(O) = 1 and 
lim;+oo W(t) = 0. Then 


C(u1, u2) = Ww" (u) + Wa) (7.44) 


is a copula if and only if Y is convex. 


Proof. See Nelsen (2006, Theorem 4.1.4). 


Definition 7.48 (Archimedean copula generator). A decreasing, continuous, con- 
vex function y: [0, œœ) — [0, 1] satisfying w(0) = 1 and lim;.0 Y(t) = 0 is 
known as an Archimedean copula generator. 
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Table 7.5. Kendall’s rank correlations and coefficients of tail dependence for the copulas 
of Table 7.4. Dı (0) is the Debye function Dı (0) = o7! ie t/(e' — 1) de. 


Copula Pr Àu I 
cot 1-1/0 2-21/6 0 
2-1/0 650 
col 6/(0+2 0 ? : 
6 /(0 + 2) 0, 0 <0, 
ch  1—4071(1 — D1(0)) 0 0 
cae (2+6)5 —2 5. sis 91/108) 
$ (2 +0)8 


Note that this definition automatically implies that w is strictly decreasing at all 
values t for which w(t) > 0, but there may be a flat piece if y attains the value 
zero. This is the only point where there is ambiguity about the inverse y—!, and we 
set w—!(0) = inf{t: Y(t) = 0}. 

Kendall’s rank correlations can be calculated for Archimedean copulas directly 
from the generator inverse using Proposition 7.49 below. The formula obtained can 
be used to calibrate Archimedean copulas to empirical data using the sample version 
of Kendall’s tau, as we discuss in Section 7.5. 


Proposition 7.49. Let X,; and X% be continuous rvs with unique Archimedean 
copula C generated by Y. Then 


1 = 
pr(X1, X2) = 1+4 f ae (7.45) 


————— dt 
o dwo!(t)/dr 


Proof. See Nelsen (2006, Corollary 5.1.4). 


For the closed-form copulas of the Archimedean class, coefficients of tail depend- 
ence are easily calculated using methods of the kind used in Example 7.37. Values 
for Kendall’s tau and the coefficients of tail dependence for the copulas of Table 7.4 
are given in Table 7.5. It is interesting to note that the generalized Clayton copula 
C S combines, in a sense, both Gumbel’s family and Clayton’s family for positive 
parameter values, and thus succeeds in having tail dependence in both tails. 


7.4.2 Multivariate Archimedean Copulas 


It seems natural to attempt to construct a higher-dimensional Archimedean copula 
according to 


C(u, ..., ua) = vw (a) + + ua)), (7.46) 


where w is an Archimedean generator function as in Definition 7.48. However, 
this construction may fail to define a proper distribution function for arbitrary 
dimension d. An example where this occurs is obtained if we take the generator 
w(t) = 1 — t, which is the Clayton generator for 6 = —1. In this case we obtain 
the Fréchet lower bound for copulas, which is not itself a copula for d > 2. 
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In order to guarantee that we will obtain a proper copula in any dimension we 
have to impose the property of complete monotonicity on wy. A decreasing function 
f(t) is completely monotonic on an interval [a, b] if it satisfies 


k 
ok ro 20, keN, te(a,b). (7.47) 


Theorem 7.50. Ify : [0, 0c) — [0, 1] is an Archimedean copula generator, then the 
construction (7.46) gives a copula in any dimension d if and only if w is completely 
monotonic. 


Proof. See Kimberling (1974). 


If an Archimedean copula generator is completely monotonic, we write w € 
Wæ. A column in Table 7.4 shows the cases where the generators are completely 
monotonic. For example, the Clayton generator is completely monotonic when 6 > 
0 and the d-dimensional Clayton copula takes the form 


Chm) = U? +- +u d+), 920, (7.48) 


where the limiting case 0 = 0 should be interpreted as the d-dimensional indepen- 
dence copula. 

The class of completely monotonic Archimedean copula generators is equivalent 
to the class of Laplace-Stieltjes transforms of dfs G on [0, oo) such that G (0) = 0. 
Let X be an rv with such a df G. We recall that the Laplace-Stieltjes transform of 
G (or X) is given by 


Ĝ(t) = n e™ dG(x) = E(e™*¥), t20. (7.49) 
0 


Itis not difficult to verify that G: [0, œœ) — [0, 1]isacontinuous, strictly decreasing 
function with the property of complete monotonicity (7.47). Moreover, G(0) = 1 
and the exclusion of distributions with point mass at zero ensures lim;_,o0 G (t) = 0. 
G therefore provides a candidate for an Archimedean generator that will generate a 
copula in any dimension. 

This insight has a number of practical implications. On the one hand, we can create 
a rich variety of Archimedean copulas by considering Laplace—Stieltjes transforms 
of different distributions on [0, 00). On the other hand, we can derive a generic 
method of sampling from Archimedean copulas based on the following result. 


Proposition 7.51. Let G be a df on [0, œœ) satisfying G(0) = 0 with Laplace- 
Stieltjes transform G as in (7.49). Let V be an rv with df G and let Y1, ..., Yq bea 
sequence of independent, standard exponential rvs that are also independent of V. 
Then the following hold. 


(i) The survival copula of the random vector X := (Y1/V,...,Ya/V) is an 
Archimedean copula C with generator Y = G. 


(ii) The random vector U := (W(X1),..., W(Xa)) is distributed according to C. 
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(iii) The components of U are conditionally independent given V with conditional 
df P(U; <u | V =v) =exp(—vp!(u)). 


Proof. For part (i) we calculate that, for x € R4, 
oo 
PUL > aye Xa > aa) = | | Jew dea) 
0 j=l 
[0,0] 
=f e UG +d) dG(v) 
0 
= G(x +++ + xa). (7.50) 


Since the marginal survival functions are given by P(X; > x) = G(x) and G is 
continuous and strictly decreasing, the result follows from writing 


P(X, > x1,- -, Xa > xa) = G(G"(P(X > x1)) ++ + ĜT!(P (Xa > x4))). 
Part (ii) follows easily from (7.50) since 
PU < u) = P(Ui < u1,..., Uq < ua) 
= P(X1 > Yu), Xa > WT (ua))- 
The conditional independence is obvious in part (iii) and we calculate that 
P(U; <u | V =v) = P(X; > y'u) | V =v) 


= P(Y; > vw! (u)) 
= evy w), 


Because of the importance of such copulas, particularly in the field of credit risk, 
we will call these copulas LT-Archimedean (LT stands for “Laplace transform”) and 
make the following definition. 


Definition 7.52 (LT-Archimedean copula). An LT-Archimedean copula is a copula 
of the form (7.46), where y is the Laplace-Stieltjes transform of a df G on [0, oo) 
satisfying G(O) = 0. 


The sampling algorithm is based on parts (i) and (ii) of Proposition 7.51. We give 
explicit instructions for the Clayton, Gumbel and Frank copulas. 
Algorithm 7.53 (simulation of LT-Archimedean copulas). 
(1) Generate a variate V with df G such that G, the Laplace-Stieltjes transform 
of G, is the generator w of the required copula. 


(2) Generate independent uniform variates Z1, ..., Zq and set Y; = — In(Z;) for 
i=1,...,dsothat Y;,..., Yq are standard exponential. 


(3) Return U = (Y (¥Y1/ V), ..., W(Ya/V))’. 
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Figure 7.12. Pairwise scatterplots of 1000 simulated points from a four-dimensional 
exchangeable Gumbel copula with 0 = 2. Data are simulated using Algorithm 7.53. 


(a) 


(b) 


(c) 


For the special case of the Clayton copula we generate a gamma variate 
V ~ Ga(1/0, 1) with 0 > 0 (see Section A.2.4). The df of V has Laplace 
transform G(t) = (1 + t)~!/°. This differs slightly from the Clayton gen- 
erator in Table 7.4 but we note that G(6t) and G(t) generate the same 
Archimedean copula. 


For the special case of the Gumbel copula we generate a positive stable vari- 
ate V ~ St(1/0, 1, y, 0), where y = (cos(a /(20)))? and 0 > 1 (see Sec- 
tion A.2.9 for more details and a reference to a simulation algorithm). This df 
has Laplace transform G(t) = exp(—t!/ 2) as desired. 


For the special case of the Frank copula we generate a discrete rv V with 
probability mass function p(k) = P(V =k) = (1 — e—9)k / (k8) for k = 
1,2,... and0 > 0. This can be achieved by standard simulation methods for 
discrete distributions (see Ripley 1987, p. 71). 


See Figure 7.12 for an example of data simulated from a four-dimensional Gumbel 
copula using this algorithm. Note the upper tail dependence in each bivariate margin 


of this copula. 
While Archimedean copulas with completely monotonic generators (Laplace— 
Stieltjes transforms) can be used in any dimension, if we are interested in 
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Archimedean copulas in a given dimension d, we can relax the requirement of 
complete monotonicity and substitute the weaker requirement of d-monotonicity. 
See Section 15.2.1 for more details. 

A copula obtained from construction (7.46) is obviously an exchangeable cop- 
ula conforming to (7.20). While exchangeable bivariate Archimedean copulas are 
widely used in modelling applications, their exchangeable multivariate extensions 
represent a very specialized form of dependence structure and have more limited 
applications. An exception to this is in the area of credit risk, although even here 
more general models with group structures are also needed. It is certainly natu- 
ral to enquire whether there are extensions to the Archimedean class that are not 
rigidly exchangeable. We present some non-exchangeable Archimedean copulas in 
Section 15.2.2. 


Notes and Comments 


The name Archimedean relates to an algebraic property of the copulas that resembles 
the Archimedean axiom for real numbers (see Nelsen 2006, p. 122). The Clayton 
copula was introduced in Clayton (1978), although it has also been called the Cook 
and Johnson copula (see Genest and MacKay 1986) and the Pareto copula (see 
Hutchinson and Lai 1990). For the Frank copula see Frank (1979); this copula 
has radial symmetry and is the only such Archimedean copula. A useful reference, 
particularly for bivariate Archimedean copulas, is Nelsen (2006). 

Theorem 7.47 is a result of Schweizer and Sklar (1983) (see also Alsina, Frank 
and Schweizer 2006). The formula for Kendall’s tau in the Archimedean family is 
due to Genest and MacKay (1986). The link between completely monotonic func- 
tions and generators which give Archimedean copulas of the form (7.46) is found 
in Kimberling (1974). See also Feller (1971) for more on the concept of complete 
monotonicity. For more on the important connection between Archimedean gener- 
ators and Laplace transforms, see Joe (1997). 

Proposition 7.51 and Algorithm 7.53 are essentially due to Marshall and Olkin 
(1988). See Frees and Valdez (1997), Schonbucher (2005) and Frey and McNeil 
(2003) for further discussion of this technique. A text on simulation techniques for 
copula families is Mai and Scherer (2012). 

Other copula families we have not considered include the Marshall—Olkin copulas 
(Marshall and Olkin 1967a,b) and the extremal copulas in Tiit (1996). There is also 
a large literature on pair copulas and vine copulas; fundamental references include 
Bedford and Cooke (2001), Kurowicka and Cooke (2006), Aas et al. (2009) and 
Czado (2010). 


7.5 Fitting Copulas to Data 


We assume that we have data vectors X;,...,X, with identical distribution 
function F, describing financial losses or financial risk-factor returns; we write 
X; = (X11,.-., Xt,a) for an individual data vector and X = (X1,..., Xa)’ for 
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a generic random vector with df F. We assume further that this df F has contin- 
uous margins F;,..., Fg and thus, by Sklar’s Theorem, a unique representation 
F(x) = C(FiQ1),..., Fa@a)). 

It is often very difficult, particularly in higher dimensions and in situations where 
we are dealing with skewed loss distributions or heterogeneous risk factors, to find 
a good multivariate model that describes both marginal behaviour and dependence 
structure effectively. For multivariate risk-factor return data of a similar kind, such 
as stock returns or exchange-rate returns, we have discussed useful overall models 
such as the GH family of Section 6.2.3, but even in these situations there can be value 
in separating the marginal-modelling and dependence-modelling issues and looking 
at each in more detail. The copula approach to multivariate models facilitates this 
approach and allows us to consider, for example, the issue of whether tail dependence 
appears to be present in our data. 

This section is thus devoted to the problem of estimating the parameters 0 of 
a parametric copula Cg. The main method we consider is maximum likelihood 
in Section 7.5.3. First we outline a simpler method-of-moments procedure using 
sample rank correlation estimates. This method has the advantage that marginal 
distributions do not need to be estimated, and consequently inference about the 
copula is in a sense “margin free”. 


7.5.1 Method-of-Moments Using Rank Correlation 


Depending on which particular copula we want to fit, it may be easier to use empirical 
estimates of either Spearman’s or Kendall’s rank correlation to infer an estimate for 
the copula parameter. We begin by discussing the standard estimators of both of 
these rank correlations. 

Proposition 7.34 suggests that we could estimate ps(X;, Xj) by calculating the 
usual correlation coefficient for the pseudo-observations: {(Fin(X1,i), Fj.n(X1,j)): 
t = 1,...,n}, where Fi n denotes the standard empirical df for the ith margin. In 
fact, we estimate ps(X;, Xj) by calculating the correlation coefficient of the ranks 
of the data, a quantity known as the Spearman’s rank correlation coefficient. This 
coincides with the correlation coefficient of the pseudo-observations when there are 
no tied observations (that is, observations with X;,; = Xs; or X;,; = Xs,; for some 
t £s). 

The rank of X; ;, written rank(X;,;), is simply the position of X;,; in the sample 
X1,i,.-., Xn,i when the observations are ordered from smallest to largest. If there 
are tied observations, we assign them a rank equal to the average rank that the 
observations would have if the ties were randomly broken; for example, the sample 
of four observations {2, 3, 2, 1} would have ranks {2.5, 4, 2.5, 1}. 

In the case of no ties the Spearman’s rank correlation coefficient is given by the 
formula 


n 


12 
r$ = A, 2 — 4(n + 1))(rank(X; j) — 4n +1). (7.51) 
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We denote by RS = rÀ) the matrix of pairwise Spearman’s rank correlation coeffi- 
cients; since this is the sample correlation matrix of the vectors of ranks, it is clearly 
a positive-semidefinite matrix. 

The standard estimator of Kendall ’s tau p;(X;, X ;) is Kendall ’s rank correlation 
coefficient: 


r= an Y sign((X,,4 — Xei)(X1,7 — Xs) (7.52) 
i= 2 £ t,i si tj s,j))- : 
1<t<s<n 
This is clearly the empirical analogue of the theoretical Kendall’s tau in Defini- 
tion 7.31. Note that the actual evaluation of this estimator for large n is time- 
consuming (in comparison with Spearman’s rho) because every pair of observa- 
tions must be considered. Again we can collect pairwise Kendall’s rank correlation 
coefficients in a matrix R? = (ri): by observing that this matrix may be written as 


-1 
R= (5) Y sign(X, — Xs) sign(X, — X,Y, 
1<t<s<n 
it is again apparent that this gives a positive-semidefinite matrix. 

In a series of examples we show how these sample rank correlations can be used 
to calibrate (or partially calibrate) various copulas. Obviously we assume that there 
are a priori grounds for considering the chosen copula to be an appropriate model, 
such as symmetry or the lack of it and the presence or absence of tail dependence. 
The general method will always be similar: we look for a theoretical relationship 
between one of the rank correlations and the parameters of the copula and substitute 
empirical values of the rank correlation into this relationship to get estimates of 
some or all of the copula parameters. 


Example 7.54 (bivariate Archimedean copulas with a single parameter). Sup- 
pose our assumed model is of the form F (x1, x2) = Co (Fı (x1), Fo(x2)), where 6 
is a single parameter to be estimated. For many such copulas a simple functional 
relationship exists between either Kendall’s tau and 0 or Spearman’s rho and 0. For 
specific examples consider the Gumbel, Clayton and Frank copulas of Section 7.4; 
in these cases we have simple relationships of the form p;(X1, X2) = f(0), as 
shown in Table 7.5. This suggests that we can calibrate these copulas by first calcu- 
lating a sample value r7” for Kendall’s tau and then solving the equation r” = f (ô) 
for 6, assuming that 6 is a valid value in the parameter space of the copula. For 
example, Gumbel’s copula is calibrated by taking 6= (1 —r*)7!, provided that 
r% > 0. Clayton’s copula interpolates between perfect negative and perfect positive 
dependence and can be calibrated to any sample Kendall’s tau value in (—1, 1). For 
the calibration of higher-dimensional Archimedean copulas using rank correlations, 
see Hofert, Machler and McNeil (2012). 


Example 7.55 (calibrating Gauss copulas using Spearman’s rho). Suppose we 
assume a meta-Gaussian model for X with copula CS?, and we wish to estimate the 
correlation matrix P. It follows from Theorem 7.42 that 


ps(Xi, Xj) = (6/7) arcsin 5 pi; ~ Pij, 
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where the final approximation is very accurate (see Figure 7.10). This suggests we 
estimate P by the matrix of pairwise Spearman’s rank correlation coefficients RS. 


The method of Example 7.55 could be used to estimate P in a t copula model 
Ci, pF 1(x1), ..., Fa(xa)), although the calibration would not be as accurate as 
in the Gaussian case. Empirical investigations of the relationship (7.41) based on 
the Monte Carlo approximation (7.42) suggest that the error |ps(X;, Xj) — pijl 
while still modest, is larger than in the Gaussian case and increases with decreasing 
degrees of freedom v. Instead we propose a method based on Kendall’s tau in the next 
example, which is based on Proposition 7.43 and could be applied to all elliptical 
copulas. 


Example 7.56 (calibrating ¢ copulas using Kendall’s tau). Suppose we assume 
a meta-t model for X with copula C i? p and we wish to estimate the correlation 
matrix P. It follows from Proposition 7.43 that 


Pr (Xj, Xj) = (2/7) arcsin pij, 


so that a possible estimator of P is the matrix R* with components given by 
* 
ij 
formation of the matrix of Kendall’s rank correlation coefficients will remain posi- 


r= sin(571j,). However, there is no guarantee that this componentwise trans- 
tive definite (although in our experience it very often does). In this case R* can be 
transformed by the eigenvalue method given in Algorithm 7.57 to obtain a positive- 
definite matrix that is close to R*. The remaining parameter v of the copula could 
then be estimated by maximum likelihood, as discussed in Section 7.5.3. 


Algorithm 7.57 (eigenvalue method). Let R* be a so-called pseudo-correlation 
matrix, i.e. a symmetric matrix of pairwise correlation estimates with unit diagonal 
entries and off-diagonal entries in [—1, 1] that is not positive semidefinite. 


(1) Calculate the spectral decomposition R* = GLG” as in (6.59), where L is 
the matrix of eigenvalues and G is an orthogonal matrix whose columns are 
eigenvectors of R*. 


(2) Replace all negative eigenvalues in L by small values 5 > 0 to obtain L. 


(3) Calculate Q = GLG’ , which will be symmetric and positive definite but not 
a correlation matrix, since its diagonal elements will not necessarily equal 
one. 


(4) Return the correlation matrix R = (Q), where go denotes the correlation 
matrix operator defined in (6.5). 


In Examples 7.55 and 7.56 we saw that it is relatively easy to calibrate the Gauss 
copula and the correlation parameter matrix P of the t copula to sample rank cor- 
relations. This technique is particularly useful when we have limited multivariate 
data and formal estimation of a full multivariate model is unrealistic. Consider the 
following hypothetical example. 
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Example 7.58 (fictitious risk integration situation). Suppose a company is 
divided into a number of business units that function semi-autonomously. The com- 
pany management would like to calculate an enterprise-wide P&L distribution for 
a one-month period. They have historical data on monthly results for each of the 
business units for the last two years only, i.e. twenty-four observations. However, 
each business unit believes that through detailed knowledge of their own business 
going back over a longer period they can specify their own P&L fairly accurately. 
Rather than attempting to fit a multivariate distribution to twenty-four observations, 
the risk-management team decides to combine the individual marginal models pro- 
vided by each of the business units using a matrix of rank correlations estimated 
from the twenty-four data points. 


In this situation we can build multivariate models by combining the known 
marginal distributions using any copula that can be calibrated to the estimated rank 
correlations. The Gaussian and ¢ copulas lend themselves to this purpose and can be 
used to build meta-Gaussian and meta-t models that are consistent with the available 
information. 

Typically, these models could then be used in a Monte Carlo risk analysis; we have 
seen in Section 7.1.4 that meta-Gaussian and meta-t models are particularly easy to 
simulate. Because the approach is obviously prone to model risk (24 observations 
provide very meagre multivariate data) it should be seen as a form of sensitivity 
analysis performed using detailed marginal information and only vague depend- 
ence information; we might choose to compare a meta-Gaussian model with no tail 
dependence and a meta-t model with, say, three degrees of freedom and very strong 
tail dependence. In Section 8.4.4 we will have more to say on this problem of risk 
integration under dependence uncertainty. 


7.5.2 Forming a Pseudo-sample from the Copula 


We now turn to the estimation of parametric copulas by maximum likelihood (ML). 
In practical situations we are seldom interested in the copula alone, but also require 
estimates of the margins to form a full multivariate model; even when the copula is 
of central interest, as it is for us in this chapter, we are forced to estimate margins in 
order to estimate the copula, since copula data are almost never observed directly. 

While we may attempt to estimate margins and copula in one single optimiza- 
tion, splitting the modelling into two steps can yield more insight and allow a more 
detailed analysis of the different model components. In this section we describe 
briefly some general approaches to the first step of estimating margins and construct- 
ing a pseudo-sample of observations from the copula. In the following section we 
describe how the copula parameters are estimated by ML from the pseudo-sample. 

Let Ê EEE fi denote estimates of the marginal dfs (possible methods are 
discussed below). The pseudo-sample from the copula consists of the vectors 
U1, ling Un, where 


UO, = O11, -.., Ôr a) = (Ê (Xa), <, Fa(X1,a))'- (7.53) 
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Observe that, even if the original data vectors X1, .. . , X, are iid, the pseudo-sample 
data are generally dependent, because the marginal estimates Ê; will in most cases 
be constructed from all of the original data vectors through the univariate samples 
X1,i,..., Xn,i. Possible methods for obtaining the marginal estimate Ê, include the 
following. 


(1) Parametric estimation. We choose an appropriate parametric model for the 
data in question and fit it by ML: for financial risk-factor return data we might 
consider the GH distribution, or one of its special cases such as Student ¢ or 
normal inverse Gaussian (NIG); for insurance or operational loss data we might 
consider a standard actuarial loss distribution such as Pareto or lognormal. 


(2) Non-parametric estimation with variant of empirical df. We could estimate 
Fj; using 
1 
EQ = — S (7.54) 


ae tee 


which differs from the usual empirical df by the use of the denominator n + 1 
rather than n. This guarantees that the pseudo-copula data in (7.53) lie strictly 
in the interior of the unit cube; to implement ML we must be able to evaluate 
the copula density at each U;, and in many cases this density is infinite on the 
boundary of the cube. 


(3) Extreme value theory for the tails. Empirical distribution functions are known 
to be poor estimators of the underlying distribution in the tails. An alternative is to 
use a technique from extreme value theory, described in Section 5.2.6, whereby 
the tails are modelled semi-parametrically using a generalized Pareto distribution 
(GPD); the body of the distribution may be modelled empirically. 


Example 7.59. We analyse five years of daily log-return data (1996-2000) 
for Intel, Microsoft and General Electric stocks. The marginal distributions are 
estimated empirically (method (2)) and the pseudo-sample from the copula is 
shown in Figure 7.13. Essentially, the points are plotted at the coordinates 
(rank(X,,;)/(n + 1), rank(X;,;)/(1 + 1)), where rank(X,;) denotes the rank of 
X;,; inthe sample X13,..., Xni. 


7.5.3 Maximum Likelihood Estimation 


Let Cg denote a parametric copula, where 0 is the vector of parameters to be esti- 
mated. The MLE is obtained by maximizing 


n 
InL(0;U,,...,Un) = X Inco (Û,) (7.55) 
t=1 
with respect to 0, where cg denotes the copula density as in (7.18) and Û, denotes 
a pseudo-observation from the copula. 
Obviously, the statistical quality of the estimates of the copula parameters depends 
very much on the quality of the estimates of the marginal distributions used in 
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Figure 7.13. Pairwise scatterplots of a pseudo-sample from a copula for trivariate Intel, 
Microsoft and General Electric log-returns (see Example 7.59). 


the formation of the pseudo-sample from the copula. The properties of estimates 
derived using the marginal estimation methods (1) and (2) in Section 7.5.2 have both 
been studied in more theoretical detail. When margins are estimated parametrically 
(method (1)), inference about the copula using (7.55) amounts to what has been 
termed the inference functions for margins (IFM) approach by Joe (1997). When 
margins are estimated non-parametrically (method (2)), the estimates of the copula 
parameters may be regarded as semi-parametric and the approach has been labelled 
pseudo-maximum likelihood by Genest and Rivest (1993) (see Notes and Comments 
for more references). One could envisage using the two-stage method to decide on 
the most appropriate copula family and then estimating all parameters (marginal 
and copula) in a final fully parametric round of estimation. 

In practice, to implement the ML method we need to derive the copula density. 
This is straightforward, if tedious, for the exchangeable Archimedean copulas of 
Section 7.4, and these have been popular models in bivariate and trivariate applica- 
tions to insurance loss data. For implicit copulas like the Gaussian and t copulas we 
use (7.19). The MLE is generally found by numerical maximization of the resulting 
log-likelihood (7.55). 
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Example 7.60 (fitting the Gauss copula). In the case of a Gauss copula we 
use (7.19) to see that the log-likelihood (7.55) becomes 


In L(P; O1,..., On) 
n n d 
=} In fP(8T' (Ôa)... DÛ, a)) — DI ngoo À), 
t=1 t=1 j=1 
where fy will be used to denote the joint density of a random vector with N4 (0, X) 
distribution. It is clear that the second term is not relevant in the maximization with 
respect to P, and the MLE is given by 


n 
P = arg max ) “In fy(%), (7.56) 
LEP i] 
where Y, j = p- (Û, ;) for j = 1,...,d and P denotes the set of all possible 


linear correlation matrices. To perform this maximization in practice, note that the 
set P can be constructed as 


P = {P = (Q): Q = AA', A lower triangular with ones on the diagonal}, 


where go is defined in (6.5). In other words, we can search over the set of unre- 
stricted lower-triangular matrices with ones on the diagonal. This search is feasible 
in low dimensions but very slow in high dimensions, since the number of parameters 
is O(d?). 

An approximate solution to the maximization may be obtained easily as follows. 
Suppose that instead of maximizing over P as in (7.56) we maximize over the set 
of all covariance matrices. This maximization problem has the analytical solution 
È= (l/n) Jia 1 Y, Y/, which is the MLE of the covariance matrix X for iid normal 
data with Nz(0, X) distribution. In practice, Š is likely to be close to being a 
correlation matrix. As an approximate solution to the original problem we could 
take the correlation matrix P = ra) (È ). 

When a Gauss copula is fitted to the trivariate data in Example 7.59 by full ML, 
the estimated correlation matrix has entries 0.58 (INTC-MSFT), 0.34 (INTC-GE) 
and 0.40 (MSFT-GE); the value of the log-likelihood at the maximum is 376.65. 
Using the alternative method gives estimates that are identical to two significant 
figures and that yield a log-likelihood value of 376.62. 

A further alternative would be to use the estimation procedure in Example 7.55, 
based on Spearman’s rank correlations. Using the Spearman method we get, respec- 
tively, 0.57, 0.34 and 0.40 for the parameter estimates; the value of the log-likelihood 
at this value of P is 376.50, which is also not so far from the maximum. 


Example 7.61 (fitting the ¢ copula). In the case of the t copula, (7.19) implies that 
the log-likelihood (7.55) is 


In L(v, P; O,..., Un) 


n n d 
= ing, pts (Ôn), -ty Gra) — >. Ding (ts (G39), 


f=1 t=1 j=1 
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where g, p denotes the joint density of a random vector with tg(v, 0, P) distribu- 
tion, P is a linear correlation matrix, g, is the density of a univariate t,(v, 0, 1) 
distribution, and t; ' is the corresponding quantile function. 

Again, in relatively low dimensions we could search over the set of correlation 
matrices P and degrees of freedom parameter v for a global maximum. For higher- 
dimensional work it would be easier to estimate P using Kendall’s tau estimates, as 
in Example 7.56, and to estimate the single parameter v by maximum likelihood. 

When a ¢ copula is fitted to the trivariate data in Example 7.59 by full ML, 
the estimated matrix P has entries 0.59 (INTC-MSFT), 0.36 (INTC-GE) and 0.42 
(MSFT-GE); the estimate of v is 6.5 and the value of the log-likelihood at the 
maximum is 420.39. Using the simpler method based on Kendall’s tau gives identical 
parameter estimates to two significant figures and a log-likelihood value of 420.32. 
Clearly, the tf model fits much better than a Gauss copula model; the log-likelihood 
is increased by over 40. This would be massively significant in a likelihood ratio 
test (although, strictly speaking, such a test introduces a technical difficulty, since 
the Gauss copula represents a boundary case of the t copula model (v = 00), which 
violates standard regularity conditions (see Notes and Comments)). 


Following standard statistical practice we usually fit a number of copula models 
to data and compare the quality of the fitted models using tools like the Akaike 
information criterion (see Appendix A.3.6). We may also carry out goodness-of-fit 
tests to assess the plausibility that the data come from any given copula. Most of the 
goodness-of-fit tests that have been suggested for copulas are quite computationally 
intensive and are limited to applications in relatively low dimensions (see Notes and 
Comments for some references). 


Notes and Comments 


The copula estimation procedure based on empirical values of Kendall’s tau is dis- 
cussed in detail for bivariate Archimedean copulas by Genest and Rivest (1993); 
they explain why the procedure may be considered to be a method-of-moments 
technique and show how confidence intervals for the copula parameter (in the case 
of single-parameter copulas) may be derived. 

The method of calibrating the Gauss copula with Spearman’s rank correlation 
in Example 7.55 is essentially due to Iman and Conover (1982). The use of this 
calibration method to build meta-Gaussian models with prescribed margins and the 
Monte Carlo simulation of data from these models are implemented in the @RISK 
software (Palisade 1997), which is widely used in insurance. Our Example 7.56 is 
intended to show that this approach can be extended to meta-t models, which may 
well be more interesting due to their tail dependence. 

The eigenvalue method for correcting the positive definiteness of correlation 
matrices given in Algorithm 7.57 is described by Rousseeuw and Molenberghs 
(1993). An empirical comparison of the eigenvalue method with different approaches 
to this problem, including so-called shrinkage methods, is found in Lindskog (2000). 

The inference functions for margins (IFM) approach to the estimation of copulas 
(method (1) of Section 7.5.2 followed by maximization of (7.55)) is described by 
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Joe (1997), who gives asymptotic theory; the name of the approach (IFM) follows 
terminology of McLeish and Small (1988). 

The pseudo-likelihood approach to copula estimation (method (2) of Section 7.5.2 
followed by maximization of (7.55)) is described in Genest and Rivest (1993), and 
the consistency and asymptotic normality of the resulting parameter estimates are 
demonstrated. In Monte Carlo simulations it is found that this method outperforms 
the Kendall’s tau method for a bivariate Clayton copula (see also Genest, Ghoudi 
and Rivest 1995). 

Frees and Valdez (1997) discuss the relevance of copulas in actuarial applications 
and give an example where copulas are fitted to data using the Kendall’s tau method 
and the IFM method. Also in an insurance context, Klugman and Parsa (1999) discuss 
ML inference for copulas and bivariate goodness-of-fit tests, while Chen and Fan 
(2005) describe a likelihood-ratio test for semi-parametric copula selection. 

The fitting of the ¢ copula to data and statistical aspects of testing this cop- 
ula against the Gauss copula are discussed at length in Mashal and Zeevi (2002); 
the technical problem that the Gauss copula is a boundary case of the ¢ copula is 
addressed in this paper and a correction is suggested. The authors provide a number 
of financial examples suggesting that extremal dependence is a feature of finan- 
cial data. Breymann, Dias and Embrechts (2003) fit various bivariate copulas to 
high-frequency financial return data at different timescales and provide extensive 
comparisons with respect to goodness-of-fit. 

There is a growing literature on goodness-of-fit tests for copulas: see the survey 
article by Genest, Rémillard and Beaudoin (2009). For an attractive graphical test, 
see Hofert and Machler (2014). 

Papers that develop dynamic time-series models for financial return data using 
copulas include Chen and Fan (2006), Patton (2004, 2006) and Fortin and Kuzmics 
(2002). A change-point problem for copulas within econometrics is discussed in 
Dias and Embrechts (2009). 
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Aggregate Risk 


This chapter is devoted to a number of theoretical concepts in quantitative risk 
management that fall under the broad heading of aggregate risk and integrated risk 
management. We understand aggregate risk as the risk of a portfolio, which could 
even be the entire position in risky assets of a financial institution. The material 
builds on general ideas in risk measurement discussed in Section 2.3 and also uses 
in certain places the copula theory of Chapter 7 and some facts about elliptical 
distributions from Section 6.3. 

In Sections 8.1-8.3 we treat the issue of measuring aggregate risk. We begin 
with general results and we discuss, in particular, the dual representation of convex 
risk measures as generalized scenarios (a mathematical extension of the idea of a 
stress test). Next we consider certain law-invariant risk measures (risk measures that 
depend only on the loss distribution). Finally, we apply the representation result to 
the case of linear portfolios and we discuss risk measurement for the special case 
of portfolios that are linear combinations of elliptically distributed risk factors. 

Section 8.4 is concerned with risk aggregation: we assume that risk capital num- 
bers for sub-units of an enterprise have been computed and we discuss methods 
for aggregating these risk capital numbers into a capital requirement for the entire 
enterprise. Moreover, we consider the problem of bounding an aggregate risk if 
we know something about the individual risks that contribute to the whole but have 
only limited information about their dependence. We discuss specific difficulties that 
arise when risk is measured with a non-subadditive risk measure like VaR. Finally, 
in Section 8.5, we treat the subject of allocating risk capital for an aggregate risk 
back to the individual risks in the portfolio. This issue is relevant for the purposes 
of performance measurement, loan pricing and capital budgeting. 


8.1 Coherent and Convex Risk Measures 


In this section we present elements of the modern theory of risk measures. Our 
exposition is a simplified account of material found in Follmer and Schied (2011). 
We begin by recalling the axioms characterizing coherent and convex risk measures. 
For the economic motivation of these axioms we refer to Section 2.3.5. 

Consider a probability space (2, F, P) and a linear space M C L£5(2, F, P), 
where L? (2, F, P) denotes the set of all random variables on (2, F, P) that are 
almost surely (a.s.) finite. Each L € M represents the loss incurred on a financial 
position over some fixed time horizon. We assume throughout that constant random 
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variables belong to M and denote them by lowercase letters. In this context a risk 
measure is a mapping ọ: M — R with the interpretation that ọ(L) gives the total 
amount of capital that is needed to back a position with loss L. The axioms for @ 
are as follows. 


Monotonicity. Lı < L2 > o(L1) < o(LZ2). 

Translation invariance. Form € R, oọo(L +m) = o(L) +m. 
Subadditivity. For Lı, L2 E€ M, o(Lı + L2) <S o(L1) + o(L2). 
Positive homogeneity. For à > 0, o(àL) = ro(L). 

Convexity. For0 < y <S 1, Lı, L2 € M, 


o(yLı + (1 — y)L2) < yo(Liı)+ (l — y)o(L2). 


Definition 8.1. A risk measure that satisfies the monotonicity, translation invariance 
and convexity axioms is called a convex measure of risk; a risk measure that satisfies 
the monotonicity, translation invariance, subadditivity and positive homogeneity 
axioms is called a coherent measure of risk. 


A coherent risk measure is automatically convex; the converse implication is not 
true, as will be seen below. On the other hand, for a positive-homogeneous risk 
measure, convexity and coherence are equivalent. 


8.1.1 Risk Measures and Acceptance Sets 


There is an important relationship between risk measures and so-called acceptance 
sets. For a given risk measure the associated acceptance set contains the positions 
that are acceptable without any backing capital. 


Definition 8.2. For a monotone and translation-invariant risk measure ọ, the asso- 
ciated acceptance set of o is the set 


Ag = {L € M: o(L) < 0}. (8.1) 


Proposition 8.3. For a monotone and translation-invariant risk measure ọ with 
associated acceptance set Ag, the following statements hold. 


(1) Ag is nonempty and satisfies the condition 
LéeA,adL < L > L € Agọ. (8.2) 
(2) @ can be reconstructed from Ag via 


o(L) = inf{m € R: L — m € Ap}. (8.3) 


Proof. Statement (1) is obvious. For (2) note that 


inf{m: L — m € Ao} = inf{mo(L — m) < 0} = inf{m: ọ(L) — m < 0}, 


and this is obviously equal to ọ(L). 
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Conversely, it is sometimes useful to start with a set A C M of acceptable 
positions and to define an associated risk measure ọ 4 using (8.3). The properties of 
such a risk measure are given in the following proposition. 


Proposition 8.4. Suppose that the set A satisfies (8.2) and define 94 by 
oa(L) = inf{m € R: L—me A}. (8.4) 


Suppose, moreover, that 9 4(L) defined in this way is finite forall L € M. Thenọ4 is 
a monotone and translation-invariant risk measure on M. The associated acceptance 
Apo, satisfies Ag, 2 A. 


Proof. These properties of ọ 4 are easily checked. 


Remark 8.5. It is natural to enquire when the sets A and Ag, in Proposition 8.4 
are equal. One result in that direction is given in Section 4.1 of Follmer and Schied 
(2011) for the case where M contains only bounded random variables: in that case, 
A = Ao, if and only if the set A is closed in the supremum norm. 


In the next proposition we require some further basic ideas from convex analysis. 
A set C C M is said to be convex if (1 — y)x + yy € C whenever x e C, y € C 
and 0 < y < 1. A convex set is a convex cone if it has the additional property that 
it is closed under positive scalar multiplication, i.e. Ax € C when x € C anda > 0. 


Proposition 8.6. 


(a) Consider a monotone and translation-invariant risk measure @ with associated 
acceptance set A, defined by (8.1). Then 


(al) @ is a convex risk measure if and only if Ag is convex, and 


(a2) @ is coherent if and only if Ag is a convex cone. 


(b) More generally, consider a set of acceptable positions A and the associated 
risk measure ọ4 defined by (8.4) (whose acceptance set may be larger than 
A). Then ọ4 is a convex risk measure if A is convex and ọ4 is coherent if A 
is a convex cone. 


Proof. For part (al) it is clear that Ay is convex if @ is convex. For the converse 
direction, consider arbitrary L1, L2 E€ M and0 < y < 1. Now fori = 1,2, 
Li — (Li) € Ag by definition of Ag. Since Ag is convex, we also have that 
y(L1—-@(L1)) +0 —y)(L2— @(L2)) € Ag. By the definition of A, and translation 
invariance we have 


0 > o(y (Lı — (L1) + (1 — y)(LZ2 — e(L£2))) 
= ọ(yLı + (l — y)L2) — (ve(Li) + d — y)e(£2)), 


which implies the convexity of ọ. 
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To prove (a2) assume that L € Ag.As ọ is positive homogeneous, o (ÀL) = ào (L) < 
0 and hence AL € Ag. Conversely, for L € M we have L — @(L) € Ag and, as Ao 
is a convex cone, also A(L — @(L)) € Ap for all A. Hence, by translation invariance, 


0 > oL — AQ(L)) = Q(AL) — Ao(L). (8.5) 


For the opposite inequality note that form < @(L), L —m ¢ Ag and hence also 
A(L — m) ¢ Ag for all A > 0. Hence 


0 < Q(AL — Am) = o(ÀL) — àm, 


i.e. Q(AL) > Am. By taking the supremum we get o(AL) > sup{Am: m < Q(L)} = 
ào(L); together with (8.5) the claim follows. 


The proof of (b) uses similar arguments to parts (al) and (a2) and is omitted. 


We now give a number of examples of risk measures and acceptance sets. 


Example 8.7 (value-at-risk). Given a confidence level a € (0, 1), suppose we call 
anrv L € M acceptable if P(L > 0) < 1 —a. The associated risk measure defined 
by (8.4) is given by 


Qa(L) :=inf{m € R: P(L —m > 0) < 1 — a} = inf{m € R: P(L < m) 2 a}, 
which is the VaR at confidence level a. 


Example 8.8 (risk measures based on loss functions). Consider a function 
£: R — R that is strictly increasing and convex and some threshold c € R. Assume 
that E(€(L)) is finite for all L € M and define an acceptance set by 


A={LEM: E(L(L)) < L()} 
and the associated risk measure by 
oa = inf{m ER: E(€(L—™m)) < &(c)}. 


In this context £ is called a loss function because the convexity of £ serves to penalize 
large losses; a loss function can be derived from a utility function u (a strictly 
increasing and concave function) by setting £(x) = —u(—x). 

The set A obviously satisfies (8.2), so that ọ 4 is translation invariant and monotone 
by Proposition 8.4. Furthermore, A is convex. This can be seen by considering 
acceptable positions Lı and L2 and observing that the convexity of £ implies 


E(e(yLi+ Ud — y)L2)) < EEL) + A — y)e(L£2)) 
< yl) +d — yee) 
= (0), 
where we have used the fact that E(€(L;)) < €(c) for acceptable positions. Hence 


yLı +(1— y)L2 € Aas required, and ọ4 is a convex measure of risk by Proposi- 
tion 8.6. 
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As a specific example we take (x) = e%* for some a > 0. We get 
Oa.c(L) := inf{m: E(e“4-™) < e%°} = inf{m: E(e%) < ete} 
1 
= — In{E(e*")} —c. 
a 


Note that Qg,-(0) = —c, which could not be true for a coherent risk measure. 
Consider the special case where c = 0 and write Qy := Qg,9. We find that for A > 1, 


Qa ÀL) = 1 In{E(e%})} > 1 infE(e!)} = àa (L), 
a a 


where the inequality is strict if the rv L is non-degenerate. This shows that Qx is 
convex but not coherent. In insurance mathematics this risk measure is known as 
the exponential premium principle if the losses L are interpreted as claims. 


Example 8.9 (stress-test or worst-case risk measure). Given a set of stress sce- 
narios $ C §2, a definition of a stress-test risk measure is 


o(L) = supi L (w): w € $}, 


i.e. the worst loss when we restrict our attention to those elements of the sample 
space {2 that belong to S. The associated acceptance set is Ag = {L: L(w) < 
0 for all m € S}, i.e. the losses that are non-positive for all stress scenarios. The 
crucial part of defining the risk measure is the choice of the scenario set S, which is 
often guided by probabilistic considerations involving the underlying measure P. 


Example 8.10 (generalized scenarios). Consider a set © of probability measures 
on (2, F) and a mapping y: Q — R such that inf{y (Q): Q € Q} > =œ. 
Suppose that supge@ Eg(|L|) < 00 for all L € M. Define a risk measure ọ by 


o(L) = sup{Eo (L) — y (Q): Q € Q}. (8.6) 


Note that a measure Q € Q such that y (Q) is large is penalized in the maximization 
in (8.6), so y can be interpreted as a penalty function specifying the relevance of 
the various measures in @. The corresponding acceptance set is given by 


Ag = {L € M: sup{Eg(L) — y(Q): QO € Q} < O}. 


Ag is obviously convex so that ọ is a convex risk measure. In fact, every convex 
risk measure can be represented in the form (8.6), as will be shown in Theorem 8.11 
below (at least for the case of finite §2). In the case where y(-) = 0 on Q, @ is 
obviously positive homogeneous and therefore coherent. 

The stress-test risk measure of Example 8.9 is a special case of (8.6) in which 
the penalty function is equal to zero and in which @ is the set of all Dirac measures 
dw(-), œ E€ S, i.e. measures such that ôe (B) = Ig(w) for arbitrary measurable sets 
B C 2. Since in (8.6) we may choose more general sets of probability measures 
than simply Dirac measures, risk measures of the form (8.6) are frequently referred 
to as generalized scenario risk measures. 
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8.1.2 Dual Representation of Convex Measures of Risk 


In this section we show that convex measures of risk have dual representations as 
generalized scenario risk measures. We state and prove a theorem in the simpler 
setting of a finite probability space. However, the result can be extended to general 
probability spaces by imposing additional continuity conditions on the risk measure. 
See, for instance, Sections 4.2 and 4.3 of Follmer and Schied (2011). 


Theorem 8.11 (dual representation for risk measures). Suppose that 2 is a finite 
probability space with |2| = n < œ. Let F = P(X), the set of all subsets of 2, 
and take M := {L: R — R}. Then the following hold. 


(1) Every convex risk measure ọ on M can be written in the form 
o(L) = max{Eg(L) — amin(Q): Q € 8'(2, F)}, (8.7) 


where 4!(2, F) denotes the set of all probability measures on 2, and where 
the penalty function dmin is given by 


&min(Q) = sup{Eg(L): L € Ag}. (8.8) 
(2) If@ is coherent, it has the representation 
o(L) = max{Eg(L): Q € Q} (8.9) 
for some set Q = Q (0) C 8! (2, F). 


Proof. The proof is divided into three steps. For simplicity we write 4! for 
8'\(2,F). 


Step 1. First we show that for L € M, 
o(L) > sup{Eg(L) — amin(Q): Q € 41}. (8.10) 
To establish (8.10), set L’ := L — ọ(L) and note that L’ € Ag so that 
dmin(Q) = sup{Eg(L): L € Ag} > Eg(L') = Eg(L) — o(L). 


This gives 0(L) > Eg(L) — Omin(Q), and taking the supremum over different 
measures Q gives (8.10). 


Step 2. The main step of the proof is now to construct for L € M a measure Qz € 
3! such that O(L) < Eg, (L) — amin(Qz); together with (8.10) this establishes 
(8.7) and (8.8). This is the most technical part of the argument and we give a simple 
illustration in Example 8.12. 

By translation invariance it is enough to construct Qz foraloss L with e(L) = 0; 
moreover, we may assume without loss of generality that ọ (0) = 0. Since |2| =n < 
oo we may identify L with the vector of possible outcomes £ = (L(@1),..., L(@n))’ 
in R”. Similarly, a probability measure Q € 4! can be identified with the vector of 
corresponding probabilities q = (q(@1),...,q(@n))’, which is an element of the 
unit simplex in R”. We may identify Ag with a convex subset “g of R”. Note that a 
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loss L with @(L) = 0 is not in the interior of Ag. Otherwise L + € would belong to 
Apo for € > 0 small enough, which would imply that 0 > ọ(L +€) = e(L) + € = 
€, which is a contradiction. Hence £ is not in the interior of Ag. According to 
the supporting hyperplane theorem (see Proposition A.8 in Appendix A.1.5) there 
therefore exists a vector u € R” \ {0} such that 


u'l > sup{u'x: x € Ag}. (8.11) 


Below we will construct Qz from the vector u. First, we show that u; > 0 for all 
1 < j < n. Since e(0) = 0, by monotonicity and translation invariance we have 
o(L—-1- Alw; < 0 for all j and all à > 0. This shows that L — 1 — Alw) € Ag 
or, equivalently, £ — 1 — Ae; € Ag, where we use the notation 1 = (1,..., 1)’. 
Applying relation (8.11) we get that u'l > u’(€ — 1 — Ae;) for all A > 0, which 
implies that 
n 
0>-)ouj—dAuj, VA>0. 
j=l 

This is possible only for uj > 0. Since u is different from zero, as least one 
of the components must be strictly positive, and we can define the vector q := 
u/ OF: , 4j). Note that q belongs to the unit simplex in R”, and we define Qz 
to be the associated probability measure. It remains to verify that Qz satisfies the 
inequality @(L) < Eg, (L) — dmin(QL); since we assumed ọ(L) = 0, we need to 
show that 


Eo, (L) 2 &min(Q). (8.12) 


We have &min(QL) = sup{E go, (X): X € Ag} = sup{q'x: x € Ag}. It follows 
from (8.11) that 


Eo, (L) = qt 2 sup{q'x: x E€ Ag} = Amin QL), 
and hence we obtain the desired relation (8.12). 


Step 3. In order to establish the representation (8.9) for coherent risk measures, 
we recall that for a coherent risk measure @ the acceptance set Ag is a convex cone. 
Hence for à > 0 we obtain 
Omin(Q) = sup Eg(L)= sup Eg(AL) = Admin(Q), 
LEAo ALEAg 


which is possible only for @min(Q) € {0, oo}. The representation (8.9) follows if 
we set Q(0) := {Q € 3!: amin(Q) = 0}. 


Example 8.12. To illustrate the construction in step (2) of the proof of Theorem 8.11 
we give a simple example (see Figure 8.1 to visualize the construction). 

Consider the case where d = 2 and where the risk measure is 9(L) = In E (et), 
the exponential premium principle of Example 8.8 with a = 1 and c = 0. Assume 
that the probability measure P is given by p = (0.4, 0.6), in which case the losses L 
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Figure 8.1. Illustration of the measure construction in step (2) 
of the proof of Theorem 8.11 (see Example 8.12 for details). 


for which @(L) = 0 are represented by the curve with equation fo (£) := In(0.4e + 
0.6e2) = 0; the acceptance set Ag consists of the curve and the region below it. 
We now construct Qz for the loss L on the curve with L(@,) = £1 = 0.5 and 
L(@2) = £2 © —0.566. A possible choice for the vector u in (8.11) (the normal 
vector of the supporting hyperplane) is to take 


Ifo Ifo 


n= Viptt) = ( ch ae 


By normalization of u the measure Qz may be identified with q ~ (0.659, 0.341), 
and for this measure the penalty is &min(Q z) = q’£ © 0.137. 


Properties of min. Next we discuss properties of the penalty function ojn. First 
we explain that @min is a minimal penalty function representing 0. Suppose that 
a: 8'(2,F) > Ris any function such that (8.7) holds for all L € M. Then for 
Qe 5'(2, F), L € M fixed we have o(L) > Eg(L) — a(Q) and hence also 


a(Q) > sup {Eg(L) — o(L)} > sup {Eg(L) — a(L)} 
LEM LEA 


Q 


> sup Eg(L) = &min(Q). (8.13) 
LeAp 


Next we give an alternative representation of min that will be useful in the analysis 
of risk measures for linear portfolios. Consider some L € M. Since L — @(L) € Ag 
we get that 


sup {Eo (L) — o(L)} = sup {Eg(L — Q(L))} < sup Eg(L) = amin(Q). 
LeM LEM LEAo 
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Combining this with the previous estimate (8.13) we obtain the relation 


Omin(Q) = sup {Eg(L) — e(L)}. (8.14) 
LEM 


8.1.3 Examples of Dual Representations 


In this section we give a detailed derivation of the dual representation for expected 
shortfall, and we briefly discuss the dual representation for the risk measure based 
on the exponential loss function. 


Expected shortfall. Recall from Section 2.3.4 that expected shortfall is given by 


1 1 
BSL) =f Ddu, ae 10.1), 
a 


for integrable losses L. The following lemma gives alternative expressions for ES,. 


Proposition 8.13. For0 <a < 1 we have 
1 
ESo(L) = aye — qa(L))") + qa (L) (8.15) 
1 
= Foy ee L > qa(L)) + qa (L) — œ — P(L > qa(L)))). (8.16) 


Proof. Recall that for U ~ U(0, 1) the random variable F“ (U) has distribution 
function Fz (see Proposition 7.2). Since qa(L) = FE (œ), (8.15) follows from 
observing that 


1 
zE- qa(L))*) 
—-aa 


1 1 
— Í (FO) — ENT 


1 1 
= =f (FÉ (u) — Fy (a@)) du 


l-a 


For (8.16) we use E((L — qq(L))*) = E(L; L > qa(L)) — qa(L)P(L > qa(L)). 


1 1 
= qu (L) du — qa (L). 
a 


If Fz is continuous at qa (L), then P(L > gq(L)) = 1 — a and 
E(L; L L 
ESy(L) = seu = E(L | L > VaRy) 
— 4q 
(see also Lemma 2.13). 
Theorem 8.14. Fora e€ [0, 1), ESg defines a coherent measure of risk on M = 
L! (R, F, P). The dual representation is given by 


ESy(L) = max{E2(L): Q € Qa}, (8.17) 


where Qa is the set of all probability measures on (92, F) that are absolutely 
continuous with respect to P and for which the measure-theoretic density dQ/dP 
is bounded by (1 — a)~!. 
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Proof. For a = 0 one has Qg = {P} and the relation (8.17) is obvious so we 
consider the case where a > 0. By translation invariance we can assume without 
loss of generality that qa (L) > 0. Since expected shortfall is only concerned with 
the upper tail, we can also assume that L > 0. (If it were not, we could simply define 
L= max(L, 0) and observe that ES, (L) = ES, (Z).) 

Define a coherent measure Qg by 


Qa(L) = sup{E? (L): Q € Qa}. (8.18) 


We want to show that Qa (L) can be written in the form of Proposition 8.13. As a 
first step we transform the optimization problem in (8.18). The measures in the set 
Q, can alternatively be described in terms of their measure-theoretic density and 
hence by the set of random variables {y : 0 < y < 1/1 — æ), E(y) = 1}. We 
therefore have 


e(l) = sup | EL: 0< V< Ban = 1}, 


1 a 3 
Transforming these random variables according to g = (1 — a)w and factoring out 
the expression E(L) we get 


E(L) Pe Vi ae 
Qa(L) = PO apf efie) 0<e<, E(g)=1 a}. 
It follows that 
L as 
Qa(L) = = sup{E(g):0<g <1, E@)=1- 2}, (8.19) 


where the measure P is defined by dP/dP = L/E(L). 
We now show that the supremum in (8.19) is attained by the random variable 


Po = ML>qu(Ly} + KM L=qu(L)}> (8.20) 
where « > 0 is chosen such that E(go) = (1 — œ). To verify the optimality of go 
consider an arbitrary 0 < g < 1 with E(g) = (1 — a). By definition of go, we must 
have the inequality 

0 < (YL = qa (L)), (8.21) 
as the first factor is nonnegative for L > qa(L) and nonpositive for L < qa(L). 
Integration of (8.21) gives 

0 < E((go — g)(L — qa (L))) = E(L (p0 — )) — qa (L)E (Go — ¢). 


The second term now vanishes as E (pọ) = E(y) = 1 — a, and the first term equals 
E (L)E (go — ¢). It follows that E (yo) = E (o), verifying the optimality of go. 
Inserting go in (8.19) gives 


E(L) 
(1 — a) 


1 
= —_E(Ly) 
l-a 


Qq(L) = Ë (go) 


1 
= z EU L > qa(L)) + Kqa(L)P(L = qa(L))). (8.22) 
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The condition E (o) = (1 — a) yields that P(L > qa(L)) + «P(L = qa(L)) = 
(1 — a) and hence that 
_ U-a@)— P(L > qa(L)) 
—  P(L = qa(L)) 
where we use the convention 0/0 = 0. Inserting « in (8.22) gives (8.16), which 
proves the theorem. 


’ 


Remark 8.15. Readers familiar with mathematical statistics may note that the con- 
struction of pọ in the optimization problem (8.19) is similar to the construction of 
the optimal test in the well-known Neyman—Pearson Lemma. The proof shows that 
for a € (0, 1) and integrable L, a measure Qz € Qa that attains the supremum in 
the dual representation of ES, is given by the measure-theoretic density 


dQr _ PO _ 
dP (l-a) 1- 


g L>) F K TliL=qa (L) 


where 
(1 — a) — P(L > qa(L)) 
k= P(L = qa(L)) i 
0, P(L = qa(L)) = 0. 


P(L = qa(L)) > 9, 


Risk measure for the exponential loss function. We end this section by giving with- 
out proof the dual representation for the risk measure derived from the acceptance 
set of the exponential loss function, given by 


1 aL 
Qa (L) = — log{E(e"™)} 
Q 


(see Example 8.8 for details). In this case we have a non-zero penalty function since 
Qa 1s convex but not coherent. It can be shown that 


1 
Qa(L) = ma [eow E wu? | P}, 


where 


E (nS) f0 «P 
HOIDES N aP ' 


oo otherwise, 


is known as relative entropy between P and Q. In other words, the penalty function 
is proportional to the relative entropy between the two measures. The proof of this 
fact requires a detailed study of the properties of relative entropy and is therefore 
omitted (see Sections 3.2 and 4.9 of Föllmer and Schied (2011) for details). 


Notes and Comments 


The classic paper on coherent risk measures is Artzner et al. (1999); a non-technical 
introduction by the same authors is Artzner et al. (1997). Technical extensions such as 
the characterization of coherent risk measures on infinite probability spaces are given 
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in Delbaen (2000, 2002, 2012). Stress testing as an approach to risk measurement 
is studied by Berkowitz (2000), Kupiec (1998), Breuer et al. (2009) and Rebonato 
(2010), among others. 

The study of convex risk measures in the context of risk management and math- 
ematical finance began with Follmer and Schied (2002) (see also Frittelli 2002). 
A good treatment at advanced textbook level is given in Chapter 4 of Follmer and 
Schied (2011). Cont (2006) provides an interesting link between convex risk mea- 
sures and model risk in the pricing of derivatives. An alternative proof of Theo- 
rem 8.11 can be based on the duality theorem for convex functions (see, for instance, 
Remark 4.18 of Follmer and Schied (2011)). 

Different existing notions of expected shortfall are discussed in the very readable 
paper by Acerbi and Tasche (2002). Expected shortfall has been independently 
studied by Rockafellar and Uryasev (2000, 2002) under the name conditional value- 
at-risk; in particular, these papers develop the idea that expected shortfall can be 
obtained as the value of a convex optimization problem. 

There has been recent interest in the subject of multi-period risk measures, which 
take into account the evolution of the final value of a position over several time 
periods and consider the effect of intermediate information and actions. Important 
papers in this area include Artzner et al. (2007), Riedel (2004), Weber (2006) and 
Cheridito, Delbaen and Kupper (2005). A textbook treatment and further references 
can be found in Chapter 11 of Follmer and Schied (2011). 


8.2 Law-Invariant Coherent Risk Measures 


A risk measure ọ is termed law invariant if ọ(L) depends on L only via its df Fz; 
examples of law-invariant risk measures are VaR and expected shortfall. On the other 
hand, the stress-test risk measures of Example 8.9 are typically not law invariant. In 
this section we discuss a number of law-invariant and coherent risk measures that 
are frequently used in financial and actuarial studies. 


8.2.1 Distortion Risk Measures 


The class of distortion risk measures is an important class of coherent risk measures. 
These risk measures are presented in many different ways in the literature, and a 
variety of different names are used. We begin by summarizing the more impor- 
tant representations before investigating the properties of distortion risk measures. 
Finally, we consider certain parametric families of distortion risk measures. 


Representations of distortion risk measures We begin with a general definition. 
Definition 8.16 (distortion risk measure). 


(1) A convex distortion function D is a convex, increasing and absolutely contin- 
uous function on [0, 1] satisfying D(O) = 0 and D(1) = 1. 

(2) The distortion risk measure associated with a convex distortion function D is 
defined by 


1 
o(L) -f qu(L) dD (u). (8.23) 
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Note that every convex distortion function is a distribution function on [0, 1]. 
The simplest example of a distortion risk measure is expected shortfall, which is 
obtained by taking the convex distortion function Dy(u) = (1 — a)! (u — a)t. 
Clearly, a distortion risk measure is law invariant (it depends on the rv L only via 
the distribution of L), as it is defined as an average of the quantiles of L. 

Since a convex distortion function D is absolutely continuous by definition, it 
can be written in the form D(u) = te o(s) ds for an increasing, positive function 
ġ (the right derivative of D). This yields the alternative representation 


1 
Q(L) = f qu(L)ġ (u) du. (8.24) 


A risk measure of the form (8.24) is also known as a spectral risk measure, and 
the function ¢ is called the spectrum. It can be thought of as a weighting function 
applied to the quantiles of the distribution of L. In the case of expected shortfall, 
the spectrum is ġ (u) = (1 — a)" Tuo , showing that an equal weight is placed on 
all quantiles beyond the a-quantile. 

A second alternative representation is derived in the following lemma. 


Lemma 8.17. The distortion risk measure ọ associated with a convex distortion 
function D can be written in the form 


o(L) = [ viber, (8.25) 
R 


where Do Fr denotes the composition of the functions D and F_, thatis, Do Fg (x) = 
D(FL(x)). 


Proof. Let G(x) = D o Fz (x) and note that G is itself a distribution function. The 
associated quantile function is given by G7 = Fý o D~, as can be seen by using 
Proposition A.3 (iv) to write 


G~(v) = inf{x: Do Fy (x) > v} = inf{x: F(x) > D| (v)} 


and noting that this equals F} o D€ (v) by definition of the generalized inverse. 
The right-hand side of (8.25) can therefore be written in the form 


1 1 
[xscw = f G~w)du= | Fý o D“ (u)du = E(F; o D“ (U)), 
R 0 0 


where U is a standard uniform random variable. Now introduce the random variable 
V = D“ (U), which has df D. We have shown that 


1 
[save ra =eR-wy f Fý (v) dD(v) 
R 0 


and thus established the result. 


The representation (8.25) gives more intuition for the idea of a distortion. The 
original df Fy, is distorted by the function D. Moreover, for u € (0, 1) we note that 
D(u) < u, by the convexity of D, so that the distorted df G = D o F places more 
mass on high values of L than the original df F. 
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Finally, we show that a distortion risk measure can be represented as a weighted 
average of expected shortfall over different confidence levels. To do this we fix 
a convex distortion D with associated distortion risk measure ọ and spectrum ¢ 
(see (8.24)). From now on we work with the right-continuous version of @. Since 
¢ is increasing we can obtain a measure v on [0, 1] by setting v([0, ¢]) := ¢ (t) for 
0 <t < 1. Note that for every function f : [0, 1] —> R we have 


1 1 
[ f(a) dv(@) = f(0)P(0) +f f(a) do(a). 
Moreover, we now define a further measure u on [0, 1] by setting 
d 1 1 
la) =(1—a), thatis, f f(a) du(a) = f0) +f fœ) — a) dọ (a). 


(8.26) 
Now we may state the representation result for a. 


Proposition 8.18. Let ọ be a distortion risk measure associated with the convex 
distortion D and define the measure u by (8.26). Then n is a probability measure 
and we have the representation 


1 
o(L) = f ESa (L) du(a). 
Proof. Using integration by parts and (8.26) we check that 
1 1 
(0, 1) = 40) +f (1 — a) dø (a) = 40) +f $(a) da — $(0) 


1 
2 f $(@) de = DU) — D@) =1, 
0 


which shows that u is a probability measure. Next we turn to the representation 
result for 9. By Fubini’s Theorem we have that 


1 1 u 
an= f abou) du = f a | 1 dv(q) du 


1 1 1 1 
= / / qu (L) la<ujdv(@) du = f / qu(L) du dv(a) 
0 J0 0 Ja 


1 1 
= (= o) ESD w = f ES, (L) du (æ). 
0 0 


Properties of distortion risk measures. Next we discuss certain properties of dis- 
tortion risk measures. Distortion risk measures are comonotone additive in the fol- 
lowing sense. 


Definition 8.19 (comonotone additivity). A risk measure ọ on a space of random 
variables M is said to be comonotone additive if 


o(Lı +--+ La) =0(L1) +--+: + e@(La) 


whenever (L1, ..., La) is a vector of comonotonic risks. 
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The property of comonotonicity was defined in Definition 7.2.1, and it was shown 
in Proposition 7.20 that the quantile function (or, in other words, the value-at-risk 
risk measure) is additive for comonotonic risks. The comonotone additivity of the 
distortion risk measures follows easily from the fact that they can be represented as 
weighted integrals of the quantile function as in (8.23). 

Moreover, distortion risk measure are coherent. Monotonicity, translation invari- 
ance and positive homogeneity are obvious. To establish that they are coherent we 
only need to check subadditivity, which follows immediately from Proposition 8.18 
and Theorem 8.14 by observing that 


1 
EES -|/ BSL YE) dunia) 
0 


1 1 
<f Esodu + f ERA OET ET 


In summary, we have verified that distortion risk measures are law invariant, 
coherent and comonotone additive. In fact, it may also be shown that, on a probability 
space without atoms (i.e. a space where P({w}) = 0 for all w), a law-invariant, 
coherent, comonotone-additive risk measure must be of the form (8.23) for some 
convex distortion D. 


Example 8.20 (parametric families). A number of useful parametric families of 
distortion risk measures can be based on convex distortion functions that take the 
form 

Dg(u) = ww !(u) +Inl-a)), O<a<1l, (8.27) 


where W is a continuous df on R. The distortion function for expected shortfall is 
obtained when ¥ (u) = 1 — e~” for u > 0, the standard exponential df. 

There has been interest in distortion risk measures that are obtained by consider- 
ing different dfs W that are strictly increasing on the whole real line R. A natural 
question concerns the constraints on W that lead to convex distortion functions. It 
is straightforward to verify, by differentiating D,(u) in (8.27) twice with respect 
to u, that a necessary and sufficient condition is that In y(u) is concave, where y 
denotes the density of W (see Tsukahara 2009). 

A family of convex distortion functions of the form (8.27) is strictly decreasing ina 
for fixed u. Moreover, Do(u) = u (corresponding to the risk measure 9(L) = E(L)) 
and limg-,; D(u) = liu=1}. The fact that for aj < a2 and 0 < u < 1 we have 
Dg, (u) > Do, (u) means that, roughly speaking, Da, distorts the original probability 
measure more than Dg, and places more weight on outcomes in the tail. 

A particular example is obtained by taking ¥ (u) = 1/(1 +e“), the standard 
logistic df. This leads to the parametric family of convex distortion functions given 
by Da (u) = (1 — aud — au)! forO<a < 1. Writing Ga (x) = Da o Fr (x) we 


can show that 
1 — Gg(x) 7 1 1 — F(x) 
( Ga(x) J-l FL (x) ) 


and this yields an interesting interpretation for this family. For every possibly critical 
loss level x, the odds of the tail event {L > x} given by (1 — F,(x))/FL(x) are 
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multiplied by (1 — w)~! under the distorted loss distribution. For this reason the 
family is known as the proportional odds family. 
8.2.2 The Expectile Risk Measure 
Definition 8.21. Let M := L! (2, F, P), the set of all integrable random variables 
L with E|L| < oo. Then, fora € (0, 1) and L € M, the a-expectile eg (L) is given 
by the unique solution y of the equation 

wE(L — yt) = (1—a@)E(L — yy”), (8.28) 
where xt = max(x, 0) and x~ = max(—x, 0). 

Recalling that x* — x~ = x, we note that e9.5(L) = E(L) since 
E(L—y))=E(L—y)t) 4> E(L-y)t-(L-y)) =0 
4> E(L—-y)=0. 


For square-integrable losses L, the expectile e,(L) can also be viewed as the mini- 
mizer in an optimization problem of the form 


min E(S(y, L)) (8.29) 
yeR 


for a so-called scoring function S(y, L). This could be relevant for the out-of- 
sample testing of expectile estimates (so-called backtesting), as will be explained 
in Section 9.3. The particular scoring function that yields the expectile is 


SEO, L) = |Yr<yy — allL — y). (8.30) 


In fact, we can compute that 


d d Q9 
Eupe e ey = = 2 
E(SyQ, L)) = f |liyzx} — ly — x) dF (x) 
dy dy J—oo 


d P i d f” j 
a val (1 — a)y — x) arwi f a(y — x) d Fr (x) 
dy J- dy Jy 


y (e0) 
=2(1 -o f (y — x) dF (x) + 2a | (y — x) dF r(x) 
-o y 


= 2(1-—a)E((L — y)") — 2 E((L — y)*), (8.31) 
and setting this equal to zero yields the equation (8.28) that defines an expectile. 


Remark 8.22. In Section 9.3.3 we will show that the a-quantile qa (L) is also a 
minimizer in an optimization problem of the form (8.29) if we consider the scoring 
function 

SO, L) = lisy — allL — yl. (8.32) 


We now show that the œ-expectile of an arbitrary df Fz can be represented as 
the œ-quantile of a related df F; that is strictly increasing on its support. This also 
shows the uniqueness of the w-expectile of a distribution. Moreover, we obtain a 
formula that can he helpful for computing expectiles of certain distributions and we 
illustrate this with a simple example. 
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Proposition 8.23. Leta € (0, 1) and L be an rv such that u := E(L) < œ. Then 
the solution ey (L) of (8.28) may be written as eœ (L) = F,'(@), where 
~ FLO) — uO) 
Fro) = y (8.33) 
2(yFi(y) — uO) +e y 
is a continuous df that is strictly increasing on its support and u(y) := Sa xdFgr(x) 
is the lower partial moment of F, . 


Proof. Since x* + x7 = |x|, equation (8.31) shows that the expectile y must also 
solve 


y 
aE(\L—yl))=E(L—y) )= Í O — x)dFrL (x) = yFL O) — uO). 


Moreover, 


y oo 


E(L-y)= f o-xar@+ f Cah 
ee ; 


y lee) 
=2 f o-ar + f (x — y)dFL (x) 


= 2(yFL(y) uy) +u- y, 


and hence w = Fz (y) with Fz as defined in (8.33). 
Next we show that Fz is indeed a distribution function. The derivative of Fr can 
be easily computed to be 


uFL(y)— u(y) = FL(y)\(u— E(L| L< y) 
ROFL) — uO) +u-—y? QROFLO)-uO+u- y)? 


Clearly, fz is nonnegative for all y and strictly positive on D = {y: 0 < F, (y) < 1}, 
so that Fy, is increasing for all y and strictly increasing on D. Let yo = inf D and 
yı = sup D denote the left and right endpoints of Fz. It is then easy to check that 
[yo, y1] is the support of Fz and limy y Fz (y) = 0 and limy-.y, FL) = 1. 


fL) = 


In the following example we consider a Bernoulli distribution where the quantile 
is an unsatisfactory risk measure that can only take the values zero and one. In 
contrast, the expectile can take any value between zero and one. 


Example 8.24. Let L ~ Be(p) be a Bernoulli-distributed loss. Then 


0, y <0, 
0, y<l, 
Fi(yy=yl-p, 0<y<1, My) = 
pP, yl, 
1, yl, 
from which it follows that 
y- p) ap 


ROS P) o<y<1 and Ba 
ET, í al) E Or aD 
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Properties of the expectile. 


Proposition 8.25. Provideda > 0.5, the expectile risk measure Q = eg is a coherent 
risk measure on M = L! (Q, F, P). 


Proof. For L € M and y € R define the function 


g(L, y, a) =aE((L — y)*)-(1—-@)E(L—y)") 
= (2a — 1)E(L— y)") + 1 o)E(L — y) 
and note that, for fixed L, it is a decreasing function of y and, for fixed y, it is 
monotonic in L, so Lı < Lo > g(Li, y, æ) < g (L2, y, æ). 

Translation invariance and positive homogeneity follow easily from the fact that 
if g(L, y, œ) = 0 (i.e. if ea (L) = y), then g(L +m, y +m, œ) = 0 for m € R and 
gAL, ày, æ) = 0 for à > 0. 

For monotonicity, fix æ and let y} = eg(L1), yo = ea(L2). If Lo > Lı then 
g(L2, y1,œ) > g(L1, y1,œ) = g(L2, y2,a) = 0. Since g is decreasing in y, it 
must be the case that y2 > y1. 

For subadditivity, again let yı = eg (L1), y2 = ea (L2). We have that 


(Li + L2, yı + y2, 4) = 2a — DJE (Li + L2 — y1 — y2)°) 
+ —-a@)E(Li+ L2 — yı — y2) 
= (Qa — I)E ((Lı + L2 — yı — y2)") 
+(L—a@)E(Li — yi) + (A — a) E(L2 — y2), 
and, since (2a — 1)E((L; — yi)™) + (1 —a@)E(L; — yi) = 0 fori = 1, 2, we get 


g(L1 + Lo, yi + yo, 0) = (2a — I)(E((L1 + L2 — yy — y2)*) 
E((L1 — y1)*) — E((L2 — y2)")) < 0, 


where we have used the fact that (x1 + x2)t < xn + cae Since g(L, y, a) is 
decreasing in y it must be the case that eg(L; + L2) < yı + y2. 


The expectile risk measure is a law-invariant, coherent risk measure for a > 0.5. 
However, it is not comonotone additive and therefore does not belong to the class 
of distortion risk measures described in Section 8.2.1. 

If Lı and L2 are comonotonic random variables of the same type (so that L2 = 
kL,+m for some m € Randk > 0), then we do have comonotone additivity (by the 
properties of translation invariance and positive homogeneity), but for comonotonic 
variables that are not of the same type we can find examples where eg(L; + L2) < 
€g(L1) + ea(L2) fora > 0.5. 


Notes and Comments 


For distortion risk measures we use the definition of Tsukahara (2009) but restrict 
our attention to convex distortion functions. Using this definition, distortion risk 
measures are equivalent to the spectral risk measures of Kusuoka (2001), Acerbi 
(2002) and Tasche (2002). A parallel notion of distortion risk measures (or premium 
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principles) has been developed in the insurance mathematics literature, where they 
are also known as Wang measures (see Wang 1996), although the ideas are also 
found in Denneberg (1990). A good discussion of these risk measures is given by 
Denuit and Charpentier (2004). The characterization of distortion risk measures on 
atomless probability spaces as law-invariant, comonotone-additive, coherent risk 
measures is due to Kusuoka (2001). The representation as averages of expected 
shortfalls is found in Follmer and Schied (2011). 

Adam, Houkari and Laurent (2008) gives examples of distortion risk measures 
used in the context of portfolio optimization. Further examples can be found in 
Tsukahara (2009), which discusses different choices of distortion function, the prop- 
erties of the resulting risk measures and statistical estimation. The concept of a dis- 
tortion also plays an important role in mathematical developments within prospect 
theory and behavioural finance: see, for instance, Zhou (2010) and He and Zhou 
(2011). 

Expectiles have emerged as risk measures around the recent discussion related to 
elicitability, a statistical notion used for the comparison of forecasts (see Gneiting 
2011; Ziegel 2015). This issue will be discussed in more detail in Chapter 9. Early 
references on expectiles include the papers by Newey and Powell (1987), Jones 
(1994) and Abdous and Remillard (1995); a textbook treatment is given in Remillard 
(2013). 


8.3 Risk Measures for Linear Portfolios 


In this section we consider linear portfolios in the set 
M={L: L=m+NX, meR, ERİ, (8.34) 


where X is a fixed d-dimensional random vector of risk factors defined on some 
probability space (2, F, P). The case of linear portfolios is interesting for a number 
of reasons. To begin with, many standard approaches to risk aggregation and capital 
allocation are explicitly or implicitly based on the assumption that portfolio losses 
have a linear relationship to underlying risk factors. Moreover, as we observed in 
Chapter 2, it is common to use linear approximations for losses due to market risks 
over short time horizons. 

In Section 8.3.1 we apply the dual representation of coherent risk measures to the 
case of linear portfolios. We show that every coherent risk measure on the set M in 
(8.34) can be viewed as a stress test in the sense of Example 8.9. In Section 8.3.2 we 
consider the important case where the factor vector X has an elliptical distribution, 
and in Section 8.3.3 we consider briefly the case of non-elliptical distributions. As 
well as deriving the form of the stress test in Section 8.3.2 we also collect a number 
of important related results concerning risk measurement for linear portfolios of 
elliptically distributed risks. 


8.3.1 Coherent Risk Measures as Stress Tests 


Given a positive-homogeneous risk measure 9: M — R it is convenient to define 
a risk-measure function rg (A) = o(à' X), which can be thought of as a function of 
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portfolio weights. There is a one-to-one relationship between @ and rg given by 
o(m +X) =m+ ro(A). 


Properties of ọ therefore carry over to rg, and vice versa. We summarize the results 
in the following lemma. 


Lemma 8.26. Consider some translation-invariant risk measure 9: M — R with 
associated risk-measure function rg. 


(1) @ is a positive-homogeneous risk measure if and only if rg is a positive- 
homogeneous function on R¢, that is, ro (td) = tro (À) for allt > 0,4 € Ri. 


(2) Suppose that ọ is positive homogeneous. Then ọ is subadditive if and only if 


Tg is a convex function on RI. 


The result follows easily from the definitions; a formal proof is therefore omitted. 

The main result of this section shows that a coherent risk measure on the set of 
linear portfolios can be viewed as a stress test of the kind described in Example 8.9, 
where the scenario set is given by the set 


So := {x € Rf: u'x < ro(u) for all u € R4}. (8.35) 
Proposition 8.27. 0 is a coherent risk measure on the set of linear portfolios M 
in (8.34) if and only if for every L = m + 'X € M we have the representation 

o(L) =m + ro(à) = sup{m + A’x: x € So}. (8.36) 


Proof. The risk measure given in (8.36) can be viewed as a generalized scenario: 
p(L) = sup{Eo (L): Q € Q}, where Q is the set of Dirac measures {ôx: x € So} 
(see Example 8.10). Such a risk measure is automatically coherent. 

Conversely, suppose that ọ is a coherent risk measure on the linear portfolio set 
«M. Since ọ is translation invariant we can set m = 0 and consider random variables 
L = X'X € M. Since Ego (AX) = d’ Eg(X), Theorem 8.11 shows that 


o(a'X) = sup{r’Eg(X): Q € 5'(2, F), amin(Q) = 0}. (8.37) 
According to relation (8.14), the measures Q € 3!(2, F) for which Omin(Q) = 0 
are those for which Eg(L) < @(L) for all L € M. Hence we get 

{Q € 8'(2, F): amin(Q) = 0} 
= {Q € 8'(2, F): u'Eọ(X) < ro(u) for all u € R%} 
= {Q € 8! (2, F): Eg(X) € So}. (8.38) 


Now define the set C := {u € R¢: 30 € 8! (2, F) with u = Eg(X)}, and denote 
the closure of C by C. Note that C and hence also C are convex subsets of R°. By 
combining (8.37) and (8.38) we therefore obtain 

oA X) = sup{A' u: u € CN So} = sup{à' n: we CN So}. 


If So C C, the last equation is equivalent to 9(A’X) = sup{A’w: u € So}, which 
is the result we require. Note that the key insight in this argument is the fact that a 
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probability measure on the linear portfolio space M can be identified by its mean 
vector; it is here that the special structure of M enters. 

To verify that Sp C Č, suppose to the contrary that there is some po € So that 
does not belong to C. According to Proposition A.8 (b) (the strict separation part of 
the separating hyperplane theorem), there would exist some u* € R? \ {0} such that 
Mou* > sup{m’u*: we C}. It follows that 


ro(u*) = o((u*)’X) < sup{Eg((u*)'X): Q € 3'(Q, F)} 
= sup{w'u*: p € Č} < pou”. 


This contradicts the fact that wo € So, which requires wyu* < ro (u*). 


The scenario set Sp in (8.35) is an intersection of the half-spaces Hy, = {x € 
R¢:u'x < ro (u)}, SO Sg is a closed convex set. The precise form of Sọ depends 
on the distribution of X and on the risk measure ọ. In the case of the quantile risk 
measure ọ = VaRgq, the set Sọ has a probabilistic interpretation as a so-called depth 
set. Suppose that X is such that for all u € R \ {0} the random variable u'X has 
a continuous distribution function. Then, for H, := {x € R’: u'x < VaRy (u'X)} 
we have that P(u'X € H,) = a, so that the set Svar, is the intersection of all 
half-spaces with probability a. 


8.3.2 Elliptically Distributed Risk Factors 


We have seen in Chapter 6 that an elliptical model may be a reasonable approximate 
model for various kinds of risk-factor data, such as stock or exchange-rate returns. 
The next result summarizes some key results for risk measurement on linear spaces 
when the underlying distribution of the risk factors is elliptical. 


Theorem 8.28 (risk measurement for elliptical risk factors). Suppose that 
X ~ Eal p, X, Y) and let M be the space of linear portfolios (8.34). For any 
positive-homogeneous, translation-invariant and law-invariant risk measure ọ on 
M the following properties hold. 


(1) For any L =m-+i'X € M we have 
o(L) = VN ZAO(Y) +A u +m, (8.39) 


where Y ~ S(w), i.e. a univariate spherical distribution (a distribution that 
is symmetric around 0) with generator w. 


(2) Ife(Y) 2 0 then @ is subadditive on M. In particular, VaRg is subadditive if 
a > 0.5. 


(3) If X has a finite-mean vector, then, for any L = m + à'X € M, we have 
y 


d d 


o(L- E(L)) = |J} pijdidjo(Xi — E(XD)o(X; — E(X;)), (8.40) 
i=l j=1 


where the pij are elements of the correlation matrix p (X). 
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(4) Ife(Y) > 0 and X has a finite covariance matrix, then, for every L € M, 


o(L) = E(L) + koy var(L) (8.41) 
for some constant kg > 0 that depends on the risk measure. 


(5) Ife(Y) > 0 and X is invertible, then the scenario set Sg in the stress-test 
representation (8.36) of 0 is given by the ellipsoid 


So = (x: Œ = WIT — u) < a(Y)’}. 
Proof. For any L € M it follows from Definition 6.25 that we can write 
L=m4+NXSNAY +N tm 
for a spherical random vector Y ~ S,(y), a matrix A € R¢** satisfying AA’ = X 
and a constant vector 4 € Ri. By Theorem 6.18 (3) we have 


LÊ JAAIY +u +m, (8.42) 


where Y is a component of the random vector Y that has the symmetric distribution 
Y ~ Sı(y). Every L € M is therefore an rv of the same type, and the translation 
invariance and homogeneity of ọ imply that 


o(L) = AAY) +u +m, (8.43) 
so that claim (1) follows. 


For (2) set Ly = mı +A)X and Lz = m +A4X. Since || A’ (A1 +A2)|| < IAA || + 
|| A’A2|| and since ọ(Y1) > 0, the subadditivity of the risk measure follows easily. 
Since E(L) = A'u + m and ọ is translation invariant, formula (8.39) implies that 


d d 1/2 
o(L — E(L)) = (> ye apoo) o), (8.44) 


i=] j=1 
where o; = / Xii fori = 1, ...,d. As a special case of (8.44) we observe that 
o(Xi — E(X;)) = o(e;X — E(e;X)) = oio (Y). (8.45) 
Combining (8.44) and (8.45) yields formula (8.40) and proves part (3). 


For (4) assume that cov(X) = cX for some positive constant c. It follows easily 
from (8.44) that ọ(L) = E(L) + Jvar(L)o(Y)/,/c and kọ = o(Y)/J/c. 
For (5) note that part (2) implies that the risk-measure function rg (À) takes the form 
ro(A) = || A'A llo (Y) + A’, so that the set Sọ in (8.36) is 
So = {x ER: u'x <w'wt||Aullo(Y), Yu € RY 
= {x € R? : u'AAT! (x — n) < ||A'ullo(Y), Yu € RY 
= E R4: oa zM 
o(Y) 


< llvll, Yv € re}, 
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where the last line follows because R? = {A’u: u € R?}. By observing that the 
Euclidean unit ball {y € R: y’y < 1} can be written as the set {y € Rf: v'y < 
lvi], Yv € Rf}, we conclude that, for x € S,, the vectors y = A~!(x — p)/0(Y) 
describe the unit ball and therefore 


So = {x E R°: (x — wy" — u) < a(Y)}. 


The various parts of Theorem 8.28 have a number of important implications. 
Part (2) gives a special case where the VaR risk measure is subadditive and therefore 
coherent; we recall from Section 2.25 that this is not the case in general. Part (3) 
gives a useful interpretation of risk measures on M in terms of the aggregation of 
stress tests, as will be seen later in Section 8.4. 

Part (4) is relevant to portfolio optimization. If we consider only the portfolio 
losses L € M for which E (L) is fixed at some level, then the portfolio weights that 
minimize ọ also minimize the variance. The portfolio minimizing the risk measure 
ọ is the same as the Markowitz variance-minimizing portfolio. 

Part (5) shows that the scenario sets in the stress-test representation of coherent 
risk measures are ellipsoids when the distribution of risk-factor changes is elliptical. 
Moreover, for different examples of law-invariant coherent risk measures, we simply 
obtain ellipsoids of differing radius ọ (Y ). Scenario sets of ellipsoidal form are often 
used in practice and this result provides a justification for this practice in the case 
of linear portfolios of elliptical risk factors. 


8.3.3 Other Risk Factor Distributions 


We now turn briefly to the application of Proposition 8.27 in situations where the risk 
factors do not have an elliptical distribution. The VaR risk measure is not coherent 
on the linear space M in general. Consider the simple case where we have two 
independent standard exponentially distributed risk factors X; and X2. 

Here it may happen that VaR, is not coherent on M for some values of a. In 
such situations, (8.36) does not hold in general and we may find vectors of portfolio 
weights A such that 

VaRy(a’X) > sup{a’x: x € Sy}, 


where 
Sa := Svar, = {x € R? : u'x < VaRy(u'X), Yu € R%}. 


Such a situation is shown in Figure 8.2 (a) for two independent standard expo- 
nential risk factors. Each line bounds a half-space with probability a = 0.7, 
and the intersection of these half-spaces (the empty area in the centre) is the 
set So.7. Some lines are not supporting hyperplanes of So.7, meaning they do 
not touch it; an example is the bold diagonal line in the upper right corner of 
the picture. In such situations we can construct vectors of portfolio weights i 
and 2 such that VaRq((A1 + A2)/X) > VaRg (A1 X) + VaRg(A5X). In fact, for 
a = 0.7 we simply have VaRg(X; + X2) > VaRg(X1) + VaRq(X2), as may 
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Figure 8.2. Illustration of scenario sets Sy when X; and X3 are two independent standard 
exponential variates and the risk measure is VaRq. (a) The case æ = 0.7, where VaRg is not 
coherent on M. (b) The case a = 0.95, where VaRg is coherent. 


be deduced from the picture by noting that the black vertical line on the right is 
xı = VaR,(X1), the upper black horizontal line is x2 = VaRg(X2), and the bold 
diagonal line is xj + x2 = VaRg(X1 + X2). This agrees with our observation in 
Example 7.30. 
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For the value œ = 0.95, the depth set is shown in Figure 8.2 (b). In this case the 
depth set has a smooth boundary and there are supporting hyperplanes bounding 
half-spaces with probability œ in every direction. We can apply Proposition 8.27 to 
conclude that VaR, is a coherent risk measure on M for œ = 0.95. 

These issues do not arise for the expected shortfall risk measure or any other 
coherent risk measure. The scenario set Sọ for these risk measures would have 
supporting hyperplanes with equation u'x = rọ (u) for every direction u Æ 0. 


Notes and Comments 


The presentation of the relationship between coherent risk measures and stress tests 
on linear portfolio spaces is based on McNeil and Smith (2012). Our definition of a 
stress test coincides with the concept of the maximum loss risk measure introduced by 
Studer (1997, 1999), who also considers ellipsoidal scenario sets. Breuer et al. (2009) 
describe the problem of finding scenarios that are “plausible, severe and useful” and 
propose a number of refinements to Studer’s approach based on ellipsoidal sets. 

The depth sets obtained when the risk measure is VaR have an interesting history 
in statistics and have been studied by Massé and Theodorescu (1994) and Rousseeuw 
and Ruts (1999), among others. The concept has its origins in an empirical concept of 
data depth introduced by Tukey (1975) as well as in theoretical work on multivariate 
analogues of the quantile function by Eddy (1984) and Nolan (1992). 

The treatment of the implications of elliptical distributions for risk measurement 
follows Embrechts, McNeil and Straumann (2002). Chapter 9 of Hult et al. (2012) 
contains an interesting discussion of elliptical distributions in risk management. 
There is an extensive body of economic theory related to the use of elliptical dis- 
tributions in finance. The papers by Owen and Rabinovitch (1983), Chamberlain 
(1983) and Berk (1997) provide an entry to the area. Landsman and Valdez (2003) 
discuss the explicit calculation of the quantity E(L | L > qa(L)) for portfolios of 
elliptically distributed risks. This coincides with expected shortfall for continuous 
loss distributions (see Proposition 2.13). 


8.4 Risk Aggregation 


The need to aggregate risk can arise in a number of situations. Suppose that capital 
amounts EC), ..., ECq (EC stands for economic capital) have been computed for 
each of d subsidiaries or business lines making up an enterprise and a method 
for computing the aggregate capital for the whole enterprise is required. Or, in a 
similar vein, suppose that capital amounts EC}, ..., ECz have been computed for d 
different asset classes on the balance sheet of an enterprise and a method is required 
to compute the overall capital required to back all assets. 
A risk-aggregation rule is a mapping 


f:R? SR, f(ECi,...,ECy) = EC, 


which takes as input the individual capital amounts and gives as output the aggregate 
capital EC. Examples of commonly used rules are simple summation 


EC = EC; +---+ECyg (8.46) 
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and correlation adjusted summation 


d d 


EC= |J} pj EG; EC}, (8.47) 
i=l j=1 


where the p;j are a set of parameters satisfying 0 < p;i; < 1, which are usually 
referred to as correlations. Of course, (8.46) is a special case of (8.47) when p;; = 1, 
Vi, j, and the aggregate capital given by (8.46) is an upper bound for the aggregate 
capital given by (8.47). 

The application of rules like (8.46) and (8.47) in the absence of any deeper con- 
sideration of multivariate models for the enterprise or the use of risk measures is 
referred to as rules-based aggregation. By contrast, the use of aggregation rules 
that can be theoretically justified by relating capital amounts to risk measures and 
multivariate models for losses is referred to as principles-based aggregation. In the 
following sections we give examples of the latter approach. 


8.4.1 Aggregation Based on Loss Distributions 


In this section we suppose that the overall loss of the enterprise over a fixed time 
interval is given by Lı +---- + La, where L),..., Lg are the losses arising from 
sub-units of the enterprise (such as business units or asset classes on the balance 
sheet). We consider a translation-invariant risk measure g and define a mean-adjusted 
version of the risk measure by 


o™"(L) = o(L — E(L)) = o(L) — E(L). (8.48) 


mean can be thought of as the capital required to cover unexpected losses. 


The capital requirements for the sub-units are given by EC; = 9™*"(L;) for i = 
1,...,d, and the aggregate capital should be given by EC = 9™*"(L,+---+ La). 
We require an aggregation rule f such that EC = f(EC),..., EC). 

As an example, suppose that we take the risk measure 0(L) = ksd(L) + E(L), 
where sd denotes the standard deviation, k is some positive constant, and second 
moments of the loss distributions are assumed to be finite. Regardless of the under- 


Q 


lying distribution of L1, ..., Lg the standard deviation satisfies 
d d 
sd(L) = >D 5 pij SA(L;) sd(L;), (8.49) 
i=l j=1 
where the p;j are the elements of the correlation matrix of (L1, ..., La) and the 


aggregation rule (8.47) therefore follows in this case. 

When the losses are elements of the linear space M in (8.34) and the distribution 
of the underlying risk-factor changes X is elliptical with finite covariance matrix, 
then (8.49) and Theorem 8.28 (4) imply that (8.47) is justified for any positive- 
homogeneous, translation-invariant and law-invariant risk measures. We now give 
a more elegant proof of this fact that does not require us to assume finite second 
moments of X. 
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Proposition 8.29. Let X ~ E(u, X, Y) with E(X) = p. Let M = {L: L = 
m+/X, à € R*, m e R} be the space of linear portfolios and let ọ be a 
positive-homogeneous, translation-invariant and law-invariant risk measure on M. 
For Li,...,Lqa € M let EC; = @™"(L;) and EC = o™™(L; +- + La). 
The capital amounts EC, EC),...,ECg then satisfy the aggregation rule (8.47), 
where the pij are elements of the correlation matrix P = # (©) and where X is the 
dispersion matrix of the (elliptically distributed) random vector (L,,..., La). 


Proof. Let Lj = NX +m; fori = 1,...,d.It follows from Theorem 8.28 (1) that 


EC; = o(Li) — E(Li) = yA; Xio (Y), 


where Y ~ Sı (W), and that 
EC = JQ, + HAV EA t+ a)l) 


d d 
= 5 ys ALIA j0(Y) 


EC; EC}. 


| d 
= y ` MDX; 
J tat SODADA DA) 
The dispersion matrix x of (L1,..., La) is now given by X = AYA’, where 


A € R@** is the matrix with rows given by the vectors À;. The correlation matrix 
P = (X) clearly has elements given by 


MEAS (A; DAA, ZA) 


and the result follows. 


Proposition 8.29 implies that the aggregation rule (8.47) can be justified when 
we work with the mean-adjusted value-at-risk or expected shortfall risk measures 
if we are prepared to make the strong assumption that the underlying multivariate 
loss distribution is elliptical. 

Clearly the elliptical assumption is unlikely to hold in practice, so the theoretical 
support that allows us to view (8.47) as a principles-based approach will generally 
be lacking. However, even if the formula is used as a pragmatic rule, there are also 
practical problems with the approach. 


e The formula requires the specification of pairwise correlations between the 
losses L1,..., Lg. It will be difficult to obtain estimates of these correlations, 
since empirical data is generally available at the level of the underlying risk 
factors rather than the level of resulting portfolio losses. 


If, instead, the parameters are chosen by expert judgement, then there are 
compatibility requirements. In order to make sense, the p;; must form the 
elements of a positive-semidefinite correlation matrix. When a correlation 
matrix is pieced together from pairwise estimates it is quite easy to violate 
this condition, and the risk of this happening increases with dimension. 
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e If Li,..., Lg are believed to have a non-elliptical distribution, then the lim- 
ited range of attainable correlations for each pair (L;, Lj), as discussed in 
connection with Fallacy 2 in Section 7.2.2, is also a relevant constraint 


e The use of (8.47) offers no obvious way to incorporate tail dependence 
between the losses into the calculation of aggregate capital. 


It might be supposed that use of the summation formula (8.46) would avoid these 
issues with correlation and yield a conservative upper bound for aggregate capital 
for any possible underlying distribution of (L1, ..., La). While this is true if the risk 
measure ọ is a coherent risk measure, it is not true in general if ọ is anon-subadditive 
risk measure, such as VaR. This is an example of Fallacy 3 in Section 7.2.2. 

It is possible that the underlying multivariate loss model is one where, for some 
value ofa, VaRy(L1+---+Lq) > VaRyg(L1)+---+VaR,(Lq). In this case, if we set 
EC; = VaR, (L;)—E(L;) and take the sum EC; + - - -+ECy, this will underestimate 
the actual required capital EC = VaR, (L) — E(L), where L = Lı +---+ La. 

In Section 8.4.4 we examine the problem of putting upper and lower bounds 
on aggregate capital when marginal distributions are known and marginal capital 
requirements are determined by the value-at-risk measure. 


8.4.2 Aggregation Based on Stressing Risk Factors 


Another situation where aggregation rules of the form (8.47) are used in practice 
is in the aggregation of capital contributions computed by stressing individual risk 
factors. An example of such an application is the standard formula approach to 
Solvency II (see, for example, CEIOPS 2006). Capital amounts EC), ..., ECg are 
computed by examining the effects on the balance sheet of extreme changes in a 
number of key risk factors, and (8.47) is used to compute an overall capital figure 
that takes into account the dependence of the risk factors. 

To understand when the use of (8.47) may be considered to be a principles-based 
approach to aggregation, suppose we write x = X(w) for a scenario defined in 
terms of changes in fundamental risk factors and L(x) for the corresponding loss. 
We assume that L(x) is a known function and, for simplicity, that it is increasing 
in each component of x. Following common practice, the d risk factors are stressed 
one at a time by predetermined amounts kı, ..., kg. Capital contributions for each 
risk factor are set by computing 


EC; = L(kje;) — L(E(X))ei), (8.50) 


where e; denotes the ith unit vector and where k; > E(X;) so that EC; > 0. The 
value EC; can be thought of as the loss incurred by stressing risk factor i by an 
amount k; relative to the impact of stressing it by its expected change, while all 
other risk factors are held constant. One possibility is that the size of the stress event 
is set at the level of the a-quantile of the distribution of X;, so that k; = qa (X;) for 
a close to 1. We now prove a simple result that justifies the use of the aggregation 
rule (8.47) to combine the contributions EC;,..., ECg defined in (8.50) into an 
aggregate capital EC. 
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Proposition 8.30. Let X ~ Eq(w, X, Y), with E(X) = u. Let M be the space of 
linear portfolios (8.34) and let 9 be a positive-homogeneous, translation-invariant 
and law-invariant risk measure on M. Then, for any L = L(X) =m +X X € M 
we have 


o(L — E(L)) = X` pij EC; EC}, (8.51) 
i=1 j=1 


where EC; = L(0(X;)e;) — L(E(X;)e;) and pij is an element of p (X). 


Proof. We observe that EC; = àjo (X;i)—à;E(X;) = io (X; — E(X;)), and (8.51) 
follows by application of Theorem 8.28 (3). 


Proposition 8.30 shows that, under a strong set of assumptions, we can aggregate 
the effects of single-risk-factor stresses to obtain an aggregate capital requirement 
that corresponds to application of any positive-homogeneous, translation-invariant 
and law-invariant risk measure to the distribution of the unexpected loss; this would 
apply to VaR, expected shortfall or one of the distortion risk measures of Sec- 
tion 8.2.1. It is this idea that underscores the use of (8.47) in Solvency I. However, 
the key assumptions are, once again, the linearity of losses in the risk-factor changes 
and the elliptical distribution of risk-factor changes, both of which are simplistic in 
real-world applications. 

We can of course regard the use of (8.47) as a pragmatic, rules-based approach. 
The correlation parameters are defined at the level of the risk factors and, for typical 
market-risk factors such as returns on prices or rates, the data may be available 
to permit estimation of these parameters. For other risk factors, such as mortality 
and policy lapse rates in Solvency II applications, parameters may be set by expert 
judgement and the same issues mentioned in Section 8.4.1 apply. In particular, the 
matrix with components ;; must be positive definite in order for the procedure to 
make any kind of sense. 

The summation rule may once again appear to be a conservative rule that avoids 
the problems related to estimating and setting correlations. However, it should be 
noted that in the presence of non-linear relationships between losses and risk factors, 
there can be complex interactions between risk factors that would require even 
higher capital than indicated by the sum of losses due to single-risk-factor stresses 
(see Notes and Comments). 


8.4.3 Modular versus Fully Integrated Aggregation Approaches 


The approaches discussed in Sections 8.4.1 and 8.4.2 can be described as modular 
approaches to risk capital. The risk is computed in modules or silos and then aggre- 
gated. In Section 8.4.1 the modules are defined in terms of business units or asset 
classes; in Section 8.4.2 the modules are defined in terms of individual risk factors. 
The former approach is arguably more natural because the losses across asset classes 
and business units are additive and it is possible to remove risks from the enterprise 
by selling parts of the business. The risks due to fundamental underlying risk factors 
are more pervasive and may manifest themselves in different parts of the balance 


304 8. Aggregate Risk 


sheet; typically, their effects can be non-linear and they can only be reduced by 
hedging. 

Regardless of the nature of the underlying silos the aggregation approaches we 
have described involve the specification of correlations and the use of (8.47) or its 
special case (8.46). We have observed that there are practical problems associated 
with choosing correlations, and in Chapter 7 we have argued that correlation gives 
only a partial description of a multivariate distribution and that copulas are a bet- 
ter approach to multivariate dependence modelling. It is natural to consider using 
copulas in aggregation. 

In the set-up of Section 8.4.1, where the total loss is given by L = Lı +---+ La 
and the L; are losses due to business units, suppose that we know, or can accu- 
rately estimate, the marginal distributions F,..., Fy for each of the modules. 
This is a necessary prerequisite for computing the marginal capital requirements 
EC; = o(L;) — E(L;). Instead of aggregating these marginal capital figures with 
correlation, we could attempt to choose a suitable copula C and build a multivariate 
loss distribution F(x) = C(F) (x1), ..., Fa(xa)) using the converse of Sklar’s The- 
orem (7.3). This is referred to as the margins-plus-copula approach. Computation 
of aggregate capital would typically proceed by generating large numbers of multi- 
variate losses from F, summing them to obtain simulated overall losses L, and then 
applying empirical quantile or shortfall estimation techniques. 

The problem with this approach is the specification of C. Multivariate loss data 
from the business units may be sparse or non-existent, and expert judgment may 
have to be employed. This might involve deciding whether the copula should have 
a degree of tail dependence, taking a view on plausible levels of rank correlation 
between the pairs (L;, L;) and then using the copula calibration methods based on 
rank correlation described in Section 7.5.1. Clearly, this approach has as many, if 
not more, problems than choosing a correlation matrix to use in (8.47). It remains 
a modular approach in which we start with models for the individual L; and add 
dependence assumptions as an overlay. In Section 8.4.4 we will address the issue of 
dependence uncertainty in such a margins-plus-copula approach; in particular, we 
will quantify the “best-to-worst” gaps in VaR and ES estimation if only the marginal 
dfs of the losses are known. 

A more appealing approach, which we describe as a fully integrated approach, 
is to build multivariate models for the changes in underlying risk factors X = 
(X1,..., Xx)’ and for the functionals g; : RÉ > R that give the losses L; = g; (X), 
i = 1,...,d, for the different portfolios, desks or business units that make up the 
enterprise. It is generally easier to build multivariate models for underlying risk 
factors because more data exist at the level of the risk factors. The models for 
X may range in sophistication from margins-plus-copula distributional models to 
more dynamic, financial econometric models. They are often referred to as economic 
scenario generators. In the fully integrated approach, aggregate capital is derived 
by applying risk measures to the distribution of L = gı (X) +---+ ga(X), and the 
losses in business units L; and L; are implicitly dependent through their mutual 
dependence on X. 
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8.4.4 Risk Aggregation and Fréchet Problems 


In the margins-plus-copula approach to risk aggregation described in Section 8.4.3 a 
two-step procedure for the construction of a model for the total loss L = Li+- - -+La 
is followed. 


(1) Find appropriate models (dfs) F1, ..., Fg for the marginal risks L1,..., Lg. 
These can be obtained by statistical fitting to historical data, or postulated 
a priori in a stress-testing exercise. 


(2) Choose a suitable copula C resulting in a joint model C (F, ..., Fa) for the 
random vector L = (L1, ..., Lg)’ from which the df for the total portfolio 
loss L can be derived. 


Based on steps (1) and (2), any law-invariant risk measure @(L) can, in principle, 
be calculated. The examples we will concentrate on in this section are ọ = VaR, and 
o = ES,. Note that there is nothing special about the sum structure of the portfolio 
L; more general portfolios (or financial positions) L = W(L,,..., Lg) for suitable 
functions ¥ : R? —> R could also be considered. 

As mentioned in Section 8.4.3, there is a lot of model uncertainty surrounding 
the choice of the appropriate copula in step (2). In this section we will therefore 
drop step (2) while retaining step (1). Clearly, this means that quantities such as 
VaRy(L) = VaRa (Li +--+- + La) can no longer be computed precisely due to the 
lack of a fully specified model for the vector L and hence the aggregate loss L. 

Instead we will try to find bounds for VaR (L) given only the marginal infor- 
mation from step (1). Problems of this type are known as Fréchet problems in the 
literature. In this section we refer to the situation where only marginal information 
is available as dependence uncertainty. 

In order to derive bounds we introduce the class of rvs 


Sa = Sa (Fi, ..., Fa) 


d 
= [r =Y i Laven La vs with Li ~ Fi, pied): 
i=l 


Clearly, every element of 4y is a feasible risk position satisfying step (1). The 
problem of finding VaR bounds under dependence uncertainty now reduces to finding 


VaRq($q) = sup{VaRy(L): L € 4a(Fi,..., Fa)} 


and 
VaR, (Sa) = inf{VaRq(L): L € 4a (Fi, ..., Fa)}. 


——a 


We will use similar notation if VaRg is replaced by another risk measure 9; for 
instance, we will write ES, and ES,,. In our main case of interest, when L = 
Lı +---+ La with the L; variables as in step (1), we will often write o(44) = 
o(L) and use similar notation for the corresponding upper and lower bounds. For 
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expected shortfall, which is a coherent and comonotone-additive risk measure (see 
Definition 8.19), we have 


d 
ESa (L) = J ESa(Li), 
i=l 


and we see that the upper bound is achieved under comonotonicity. We often refer 
to o as the best and @ as the worst ọ; this interpretation depends, of course, on the 
context. 

The calculation of VaR, (44), VaR, (8a) and ES, (8a) is difficult in general. We 
will review some of the main results without proof; further references on this very 
active research area can be found in Notes and Comments. The available results very 
much depend on the dimension (d = 2 versus d > 2) and whether the portfolio is 
homogeneous (F = --- = Fy) or not. We begin with a result for the case d = 2. 


Proposition 8.31 (VaR, d = 2). Under the set-up above, Va € (0, 1), 


VaRy($2)= inf {F7 (@t+x)+F,'(—x)} 
xe[0,1—a] 


and 


VaR, (82) = inf (Fy) + Fy =x). 


Proof. See Makarov (1981) and Ruschendorf (1982). 


From the above proposition we already see that the optimal couplings—the 
dependence structures achieving the VaR bounds—combine large outcomes in one 
risk with small outcomes in the other. Next we give VaR bounds for higher dimen- 
sions, assuming a homogeneous portfolio. 


Proposition 8.32 (VaR, d > 2, homogeneous case). Suppose that F := F; = 
+++ = Fq and that for some b € R the density function f of F (assumed to exist) is 
decreasing on |b, œ). Then, fora € [F(b), 1) and X ~ F, 


VaRy ($a) = dE(X | X e[F '(at (d — lc, F'(1—o))), (8.52) 


where c is the smallest number in [0, (1 — a)/d] such that 


l-c = = 
1 Piono SA g erar 
a+(d—1)c d 


If the density f of F is decreasing on its support, then fora € (0, 1) and X ~ F, 


VaR, ($a) = max{(d — 1)F7!(0) + F7! (œ), dE(X | X < F-'(@))}. (8.53) 


Proof. For the proof of (8.52) see Wang, Peng and Yang (2013). The case (8.53) 
follows by symmetry arguments (see Bernard, Jiang and Wang 2014). 
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Remark 8.33. First of all note the extra condition on the density f of F that is 
needed to obtain the sharp bound (8.53) for VaR, (4a): we need f to be decreasing 
on its full support rather than only on a certain tail region, which is sufficient for 
(8.52). As a consequence, both (8.52) and (8.53) can be applied, for instance, to the 
case where F is Pareto, but for lognormal rvs only (8.52) applies. 

Though the results (8.52) and (8.53) look rather involved, they exhibit an inter- 
esting structure. As in the case where d = 2, the extremal couplings combine large 
and small values of the underlying df F. More importantly, if c = 0, then (8.52) 
reduces to 


VaRa ($a) = ESa (4a). (8.54) 


The extremal coupling for VaR is rather special and differs from the extremal cou- 
pling for ES, which is of course comonotonicity. The condition c = 0 corresponds 
to the crucial notion of d-mixability (see Definition 8.35). 

The observation (8.54) is relevant to a discussion of the pros and cons of value- 
at-risk versus expected shortfall and the regulatory debate surrounding these risk 
measures. Although the upper bounds coincide in the case c = 0, it is much easier 
to compute ES,(8,) due to the comonotone additivity of expected shortfall (see 
Embrechts et al. (2014) and Notes and Comments). 


Similar to Proposition 8.32, a sharp bound for the best ES case for a homogeneous 
portfolio can be given, and this also requires a strong monotonicity condition for 
the underlying density. Here the lower expected shortfall risk measure LES, enters; 
for a € (0, 1) this is defined to be 


1 a 
LES, (X) = — / VaR, (X) du = — ES\_9(—X). 
a Jo 


Proposition 8.34 (ES, d > 2, homogeneous case). Suppose that F = F, = 

- = Fy, that F has a finite first moment and that the density function of F 
(which is assumed to exist) is decreasing on its support. Then, fora € [1 — dc, 1), 
B =(1-—a)/d andx ~ F, 


1 fÊ 
ES, (4a) = sf (a - DF! ((d -= Dt) + F7 — t)) dt 
0 
= (d — 1)? LES @—1)p(X) + ES1-8(X), (8.55) 
where c is the smallest number in [0, 1/d] such that 


Fe. rig l1—de -1 -1 
I F~ (d)dt > ((d—1)F— ((d-1)) +F (1-—c)). 
(d—l)c d 


Proof. See Bernard, Jiang and Wang (2014). 


An important tool in proofs of these results is a general concept of multivariate 
negative dependence known as mixability, which is introduced next. 
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Definition 8.35. A df F on R is called d-completely mixable (d-CM) if there exist 


d rvs X1,...,Xq ~ F such that for some k € R, 

P(X, +---+ Xq = dk) = 1. (8.56) 
F is completely mixable if F is d-CM for all d > 2. The dfs Fi, ..., Fg on R are 
called jointly mixable if there exist d rvs X; ~ F;,i = 1,...,d, such that for some 
ceR, 


P(Xi +--+ Xq4=0c)=l. 


Clearly, if F has finite-mean u, we must have k = u in (8.56). Complete mixabil- 
ity is a concept of strong negative dependence. It is indeed this dependence structure 
that yields the extremal couplings in Proposition 8.32. The above definition of d- 
complete mixability and its link to dependence-uncertainty problems can be found 
in Wang and Wang (2011). Examples of completely mixable dfs are the normal, Stu- 
dent ¢, Cauchy and uniform distributions. In Rüschendorf and Uckelmann (2002) 
it was shown that any continuous distribution function with a symmetric and uni- 
modal density is d-completely mixable for any d > 2. See Notes and Comments 
for a historical perspective and further references. 

In contrast to the above analytic results for the homogeneous case, very little 
is known for non-homogeneous portfolios, i.e. for portfolios where the condition 
F, = --- = Fq does not hold. In general, however, there is a fast and efficient 
numerical procedure for solving dependence-uncertainty problems that is called the 
rearrangement algorithm (RA) (see Embrechts, Puccetti and Ruschendorf 2013). 
The RA was originally worked out for the calculation of best/worst VaR bounds; it 
can be generalized to other risk measures like expected shortfall. Mathematically, 
the RA is based on the above idea of mixability. For instance, for the calculation 
of VaRa (L), one discretizes the (1 — a)100% upper tail of the underlying factor 
dfs F,,..., Fg, using N = 100000 bins, say. For the dimension d, values around 
and above 1000, say, can easily be handled by the RA. It can similarly be used 
for the calculation of VaR, (L) and ES,(L) in both the homogeneous and non- 
homogeneous cases. 

With the results discussed above, including the RA, we can now calculate several 
quantities related to diversification, (non-)coherence, and model and dependence 
uncertainty. We restrict our attention to the additive portfolio L = L\+---+Lq under 
the set-up in step (1). It will be useful to consider several functions X: R? > R 
that compare risk measures under different dependence assumptions. Examples 
encountered in the literature are the following. 


Super/subadditivity indices. a = VaR,(L), b = VaR% (L), and X: (a, b) = a/b, 
X2(a, b) = 1 — (a/b), X3(a, b) = b — a. VaR (L) denotes the comonotonic 
case, i.e. VaRt(L) = $4; VaRa (Li). 


Worst superadditivity ratio. a = VaRy(L), b = VaR (L), and X4(a, b) = a/b; 
the case is similar for the best superadditivity ratio, replacing VaRg(L) by 
VaR, (L). 
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Dependence-uncertainty spread. Either a = VaR,(L), b = VaRg(L) ora = 
ES, (L), b = ES,(L) and X5(a,b) = b — a. A further interesting mea- 
sure compares the VaR dependence-uncertainty spread with the ES dependence- 
uncertainty spread. 

Best/worst (VaR, ES) ratios. Eithera = VaR, (L), b = ES, (L) ora = VaRq(L), 
b = ES, (L) and X6(a, b) = b/a, say. 

As we already observed in (8.54), whenever c = 0 in Proposition 8.32 we have 

that VaRy ($a) = ESe (44). The following result extends this observation in an 

asymptotic way. 

Proposition 8.36 (asymptotic equivalence of ES and VaR). Suppose that Li ~ F;, 

i > 1, and that 
(i) for somek > 1, E(|Li — E(Lj)|") is uniformly bounded, and 
(ii) for some « € (0, 1), 

1 d 
lim inf A 2 ES, (Li) > 0. 
i= 
Then, asd > œ, oe 
ESa (4a) 
VaRa (Sa) 
Proof. See Embrechts, Wang and Wang (2014). 


=14+0(d0/8-!), (8.57) 


Proposition 8.36 shows that under very general assumptions typically encountered 
in QRM practice, we have that for d large, VaRa (L) ~ ESy(L). The proposition 
also provides a rate of convergence. From numerical examples it appears that these 


asymptotic results hold fairly accurately even for small to medium values of the port- 
folio dimension d (see Example 8.40). From the same paper (Embrechts, Wang and 
Wang 2014) we add a final result related to the VaR and ES dependence-uncertainty 
spreads. 


Proposition 8.37 (dependence-uncertainty spread of VaR versus ES). Take 
0 < a, < a < 1 and assume that the dfs F;, i > 1, satisfy condition (i) of 
Proposition 8.36 as well as 
d 
paar ol 
(iii) lim inf 7 2 LES, (X;) > 0 and 
i= 
d 
1 E(Xi 
(iv) lim sup Paint BO) (Xi) <1 
Then 
VaRw» ($a) — VaR,, (8 
lim inf aeo ~ VaRa, (6a) 
d>  ESq, (a) — ES, (Sa) 


(8.58) 


Proof. See Embrechts, Wang and Wang (2014). 


310 8. Aggregate Risk 


VaR q (L, + Ly) 


0.5 0.6 0.7 0.8 0.9 1.0 


Figure 8.3. The worst-case VaRg (L) (solid line) plotted against «œ for two standard normal 
risks; the case of comonotonic risks VaR (L) is shown as a dotted line for comparison. 


Remark 8.38. Propositions 8.36 and 8.37 are relevant to the ongoing discussion 
of risk measures for the calculation of regulatory capital. Recall that under the 
Basel framework for banking and also the Solvency II framework for insurance, 
VaR-based capital requirements are to be compared and contrasted with those based 
on expected shortfall. In particular, comparisons are made between VaR and ES 
at different quantiles, e.g. between VaRo.99 and ESo.975. The above propositions 
add a component of dependence uncertainty to these discussions. In particular, the 
dependence-uncertainty spread of VaR is generally larger than that of ES. For a 
numerical illustration of this, consider Example 8.40 below. 


Examples. We consider examples where the aggregate loss is given by L = Lı + 
- -++ La. For any risk measure ọ we denote by ot (L) the value of ọ when Ly,..., La 
are comonotonic, and we write + (L) when they are independent. In a first example 
we consider the case when d = 2 and Fj = F2 = @, the standard normal df. In 
Example 8.40, higher-dimensional portfolios with Pareto margins are considered. 


Example 8.39 (worst VaR for a portfolio with normal margins). For i = 1, 2 let 
F; = ®©. In Figure 8.3 we have plotted VaRy(L) calculated using Proposition 8.31 
as a function of œ together with the curve corresponding to the comonotonic case 
VaR* (L) calculated using Proposition 7.20. The fact that the former lies above 
the latter implies the existence of portfolios with normal margins for which VaR 
is not subadditive. For example, for œ = 0.95, the upper bound is 3.92, whereas 
VaRa (Li) = 1.645, so, for the worst VaR portfolio, VaRo.95(L) = 3.92 > 3.29 = 
VaRo.95(L1)+ VaRo.95 (£2). The density function of the distribution of (L1, L2) that 
leads to the VaR, (ZL) is shown in Figure 8.4 (see Embrechts, Hoing and Puccetti 
(2005) for further details). 


Example 8.40 (VaR and ES bounds for Pareto margins). In Tables 8.1 and 8.2 we 
have applied the various results to a homogeneous Pareto case where L; ~ Pa(@, 1), 
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Figure 8.4. Contour and perspective plots of the density function of the distribution of 
(L1, L2) leading to the worst-case VaRq(L) for L = Lı + L2 at the a = 0.95 level when 
the L; are standard normal. 


i = 1,...,d, so that the common df is F(x) = 1 — (1 +x), x > 0 (see 
also Section A.2.8 in the appendix). In Table 8.1 we consider cases where 0 > 3, 
corresponding to finite-variance distributions; in Table 8.2 we consider cases where 
0 < 2, corresponding to infinite-variance distributions. The values d = 8 and 
d = 56 are chosen with applications to operational risk in mind. In that context 
d = 8 corresponds to a relatively low-dimensional aggregation problem and d = 56 
to a moderately high-dimensional aggregation problem (see Chapter 13). 

We use the analytic results from Propositions 8.32 and 8.34, as well as the RA. 
For the independent case, simulation is used. The figures given are appropriately 
rounded. For the homogeneous case we only report the analytic bounds (with a 
numerical root search for c in the propositions). The RA bounds are close to identical 
to their analytical counterparts, with only very small deviations for heavy-tailed 
dfs, i.e. for small 6. We note that for heavy-tailed risks, the RA requires a fine 
discretization, and hence considerably more time is needed to calculate ES, (L). 

Both tables confirm the results discussed above, i.e. ESy (L) /VaR a (L) is close to 
1 even for d = 8; the dependence-uncertainty spreads behave as stated in Proposi- 
tion 8.37, and finally, in the Pa(@, 1), 0 > 1, case we have that 


_ ESg(L) 8 
lim = 
at! VaR (L) 60-1 
which can be observed in the examples given above, though the convergence is much 
slower here. The latter result holds more generally for distributions with regularly 


y 
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Table 8.1. ES and VaR bounds for L = Lı +---+ Lg, where L; ~ Pa(6,1),i=1,...,d, 
with df F(x) = 1—(1+.x)79, x > 0, for 6 > 3. See Example 8.40 for a discussion. 


d=8 d=56 
-—_————— — — ——— ————~w 
0=10 a=95% a=99% aA=99.9% a=95% A=99% aA=99.9% 


VaRy 4.0 6.1 9.7 28.0 42.6 68.1 
VaRt 2.8 4.7 8.0 19.6 32.8 55.7 
VaRL 1.5 1.9 2.5 7.8 8.6 9.6 
VaR, 0.7 0.8 1.0 5.1 5.9 6.2 
ES, 4.0 6.1 9.7 28.0 42.6 68.1 
Est 1.8 2.2 2.8 8.3 9.1 10.0 
ES, 0.9 1.2 17 6.2 6.2 6.2 
d=8 d= 56 


pi  nmn e——-—_—- ow» 
0=5 a=95% a=99% aA=999% a=95% A=99% æ = 99.9% 


VaR y 10.2 17.1 31.7 71.4 119.8 222.7 
VaR? 6.6 12.1 23.8 46.0 84.7 166.9 
VaRg 3:7 5.0 V2 18.3 20.7 24.1 
VaRy 1.6 1.8 3.0 11.0 12.9 13.8 
ESa 10.2 17.1 31.8 71.4 119.8 222.7 
ES 4.5 5.9 8.7 19.8 22.2 26.0 
ES, 2.5 3.8 6.5 14.0 14.0 14.3 
d=8 d = 56 


jii OSS 2.20 
6=3 a=95% a=99% aA=999% a=95% A=99% «æ = 99.9% 


VaRy 24.1 46.9 110.2 171.9 333.7 783.7 
VaRt 13.7 29.1 72.0 96.0 203.9 504.0 
VaRL 8.1 12.3 23.0 39.1 47.6 67.2 
VaR, 2.9 3.6 9.0 20.4 24.9 27.2 
ESy 24.6 47.7 112.0 172.0 333.9 784.0 
ESŁ 11.0 17.0 32.9 44.9 56.2 85.5 
ES, 7.2 12.9 29.0 28.6 31.3 56.4 


varying tails (see Definition 5.7 and Karamata’s Theorem (Appendix A.1.4) and 
recall that this includes distributions like the Student t and loggamma distributions). 

Table 8.2 also includes the case 0 = 0.8, i.e. an infinite-mean case (for which ES 
is not defined). Here we note that VaR+ (L) > VaRt (L), so this gives an example of 
superadditivity of VaRg in the case of independence (see the discussion following 
Example 2.25). 


Notes and Comments 


The use of a standard formula approach based on the kind of aggregation embodied 
in (8.47) is permitted under Solvency II; see CEIOPS (2006), a document produced 
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Table 8.2. ES and VaR bounds for L = Lı +---+ Lg, where L; ~ Pa(0, 1),i=1,...,d, 
with df F(x) = 1 — (1 + x)7®, x > 0, for @ < 2. See Example 8.40 for a discussion. 


d=8 d = 56 
pa -—_—— ._a—M—>—a>s>s 


0 =2 a=95% a=99% a=99.9% aA=95% A=99% æ = 99.9% 


VaRy 59 142 465 440 1054 3454 
VaR 28 72 245 194 504 1715 
VaRo 18 35 96 89 132 293 
VaR, 5 9 31 36 46 53 
ESq 64 152 498 445 1064 3 486 
ESk 31 63 184 123 205 518 
ES, 24 56 178 75 149 472 
d=8 d = 56 


E E." ————— E. 
0=1.5 a=95% a=99% a=99.9% aA=95% A=99% «a = 99.9% 


VaRy 135 409 1928 1 100 3 323 15629 
VaRt 51 164 792 357 1150 5544 
VaR+ 39 98 413 207 421 1574 
VaR, 8 21 99 56 77 99 
ESa 169 509 2392 1182 3563 16744 
ESk 98 265 1159 419 1016 4126 
ES, 88 258 1199 323 945 4390 
d=8 d = 56 


0=0.8 a=95% a=99% aA=99.9% a=95% A=99% aA=99.9% 


VaR ow 2250 16 873 300 182 35 168 263 301 4 683 172 
VaR% 330 2522 44979 2313 17653 314855 
VaRo 620 4349 75 877 7318 49 858 862 855 
VaR y 41 315 5622 207 433 5622 


by the Committee of European Insurance and Operational Pensions Supervisors 
(now EIOPA). 

In the banking context the summation approach in (8.46) is commonly used, 
particularly for the aggregation of capital requirements for market and credit risk. 
As explained by Breuer et al. (2010) this is commonly justified by assuming that 
credit risk arises from the banking book and market risk from the trading book, but 
they point out that, for derivative instruments depending on both market and credit 
risks, it can potentially lead to underestimation of risk. In contrast, Alessandri and 
Drehmann (2010) study integration of credit risk and interest-rate risk in the banking 
book and conduct simulations suggesting that summation of capital for the two risk 
types is likely to be too conservative, Drehmann, Sorenson and Stringa (2010) 
argue that credit and interest-rate risk must be assessed jointly in the banking book. 
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Kretzschmar, McNeil and Kirchner (2010) make similar points about the importance 
of developing fully integrated models rather than modular approaches. 

A summary of methodological practice in economic capital models before the 
2007 crisis is presented in a comprehensive survey by the International Financial 
Risk Institute that included both banks and insurance companies. In this survey, the 
prevailing approach to integration is reported to be the use of correlation matrices 
(see IFRI Foundation and CRO Forum 2007). This approach was favoured by over 
75% of the surveyed banks, with the others using simulation approaches based on 
scenario generation or hybrid approaches. In the insurance industry there was more 
diversity in the approaches used for integration: around 35% of respondents used 
the correlation approach and about the same number used simulation; the remainder 
reported the use of copulas or hybrid approaches. 

There is a large literature on Fréchet problems: see, for instance, Chapter 2 in 
Ruschendorf (2013). From a QRM perspective, Embrechts and Puccetti (2006) gave 
the field a considerable boost. The latter paper also contains the most important ref- 
erences to the early literature. Historically, the question of bounding the df of a sum 
of rvs with given marginals goes back to Kolmogorov and was answered by Makarov 
(1981) for d = 2. Frank, Nelsen and Schweizer (1987) restated Makarov’s result 
using the notion of a copula. Independently, Ruschendorf (1982) gave a very elegant 
proof of the same result using duality. Williamson and Downs (1990) introduced 
the use of dependence information. Numerous other authors (especially in analysis 
and actuarial mathematics) have contributed to this area. Besides the comprehen- 
sive book by Müller and Stoyan (2002), several other texts in actuarial mathematics 
contain interesting contributions on dependence modelling: for an introduction, see 
Chapter 10 in Kaas et al. (2001). A rich set of optimization problems within an 
actuarial context are to be found in De Vylder (1996); see especially “Part II: Opti- 
mization Theory”, where the author “shows how to obtain best upper and lower 
bounds on functionals T (F) of the df F of a risk, under moment or other integral 
constraints”. An excellent account is to be found in Denuit and Charpentier (2004). 
The definitive account from an actuarial point of view is Denuit et al. (2005). A 
wealth of actuarial examples is to be found in the two extensive articles Hurlimann 
(2008a) and Hurlimann (2008b). 

The rearrangement algorithm (RA) for VaR appeared in Embrechts, Puccetti and 
Ruischendorf (2013) and was based on earlier work by Puccetti and Ruschendorf 
(2012). Full details on the RA are collected by Giovanni Puccetti at https:/ 
/sites.google.com/site/rearrangementalgorithm/. The interested reader may also 
search the literature for probability box (or p-box) and the related Dempster-Shafer 
Theory. These search items lead to well-established theory and numerous exam- 
ples in the realm of engineering, computer science and economics. For expected 
shortfall, the RA was worked out in Puccetti (2013). For the analytical results and 
a discussion of the sharpness of the various bounds for VaR and ES, the papers 
cited for the corresponding propositions give an excellent introduction. Some fur- 
ther interesting papers are Bernard, Jiang and Wang (2014), Bernard et al. (2013) 
and Bernard, Ruschendorf and Vanduffel (2013). 
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The notion of complete mixability leads to a condition of negative dependence 
for multivariate (d > 2) random vectors. Recent developments in this field are 
summarized in Puccetti and Wang (2015), Puccetti and Wang (2014) and Wang and 
Wang (2014). 

Rosenberg and Schuermann (2006) gives some idea of the applicability of aggre- 
gation ideas used in this chapter. The authors construct the joint risk distribution 
for a typical, large, internationally active bank using the method of copulas and 
aggregate risk measures across the categories of market, credit and operational risk. 

For an illustration of the ideas and results of Section 8.4.4 in the practical envi- 
ronment of a Norwegian financial group, see Dimakos and Aas (2004) and Aas and 
Puccetti (2014). The latter paper contains a discussion of the best/worst couplings. 
See also Embrechts, Puccetti and Ruschendorf (2013) on this topic. 


8.5 Capital Allocation 


The final section of this chapter essentially looks at the converse problem to Sec- 
tion 8.4. Given a model for aggregate losses we now consider how the overall capital 
requirement may be disaggregated into additive contributions attributable to the dif- 
ferent sub-portfolios or assets that make up the overall portfolio. 


8.5.1 The Allocation Problem 


As in Section 8.4.1 let the rvs L1, ..., La represent the losses (or negative P&Ls) 
arising from d different lines of business, or the losses corresponding to d different 
asset classes on the balance sheet of a firm. In this section we will refer to these 
sub-units of a larger portfolio simply as investments. The allocation problem can be 
motivated by considering the question of how we might measure the risk-adjusted 
performance of different investments within a portfolio. 

The performance of investments is usually measured using a RORAC (return on 
risk-adjusted capital) approach, i.e. by considering a ratio of the form 


expected profit of investment i 


risk capital for investment i eo) 
The general approach embodied in (8.59) raises the question of how we should 
calculate the risk capital for an investment that is part of a larger portfolio. It should 
not simply be the stand-alone risk capital for that investment considered in isolation; 
this would neglect the issue of diversification and give an inaccurate measure of the 
performance of an investment within the larger portfolio. Instead, the risk capital for 
an investment within a portfolio should reflect the contribution of that investment 
to the overall riskiness of the portfolio. A two-step procedure for determining these 
contributions is used in practice. 


(1) Compute the overall risk capital 0(L), where L = ae L; and @ is a par- 
ticular risk measure such as VaR, ES or a mean-adjusted version of one of 
these (see (8.48)); note that at this stage we are not stipulating that 9 must be 
coherent. 
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(2) Allocate the capital 9(L) to the individual investments according to some 
mathematical capital allocation principle such that, if AC; denotes the capital 
allocated to the investment with potential loss L; (the so-called risk contri- 
bution of unit i), the sum of the risk contributions corresponds to the overall 
risk capital e(L). 

In this section we are interested in step (2) of the procedure; loosely speaking, we 
require a mapping that takes as input the individual losses L),..., Lg and the risk 
measure ọ and yields as output the vector of risk contributions (AC), ..., ACq) 
such that 


d 
o(L)= > AC, (8.60) 
i=l 


and such a mapping will be called a capital allocation principle. The relation (8.60) 
is sometimes called the full allocation property since all of the overall risk capital 
o(L) (not more, not less) is allocated to the investments; we consider this property 
to be an integral part of the definition of an allocation principle. Of course, there are 
other properties of a capital allocation principle that are desirable from an economic 
viewpoint; we first make some formal definitions and give examples of allocation 
principles before discussing further properties. 


The formal set-up. Let Lı,..., Lq be rvs on a common probability space 
(2, F, P) representing losses (or profits) for d investments. For our discussion 
it will be useful to consider portfolios where the weights of the individual invest- 
ments are varied with respect to our basic portfolio (L1, ..., La), which is regarded 
as a fixed random vector. That is, we consider an open set A C Rf \ {0} of portfolio 
weights such that 1 € A and define for A € A the loss L(A) = Fia ài Li; the loss 
of our actual portfolio is of course L (1). Let ọ be some risk measure defined on a set 
M that contains the rvs {L(A): à € A}. As in Section 8.3.1 we use the associated 
risk-measure function rg: A > R with ro (à) = e(L(A)). 


8.5.2 The Euler Principle and Examples 


From now on we restrict our attention to risk measures that are positive homoge- 
neous. This may be a coherent risk measure, or a mean-adjusted version of a coherent 
risk measure as in (8.48); it may also be VaR (or a mean-corrected version of VaR) or 
the standard deviation risk measure. Obviously, the associated risk-measure function 
must satisfy rọ (tÀ) = tro(à) for allt > 0,4 € A, so rọ: A — R is a positive- 
homogeneous function of a vector argument. Recall Euler’s well-known rule that 
states that if rg is positive homogeneous and differentiable at à € A, we have 
: arg 
ro(A) = Aj——(). (8.61) 
oy OX; 


If we apply this at A = 1, we get, using that o(L) = rg(1), 


This suggests the following definition. 
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Definition 8.41 (Euler capital allocation principle). If rọ is a positive-homo- 
geneous risk-measure function, which is differentiable at A = 1, then the Euler 
capital allocation principle associated with ọ has risk contributions 


ə 
AC = aM, l<i<d. 
i 


The Euler principle is sometimes called allocation by the gradient, and it obvi- 
ously gives a full allocation of the risk capital. We now look at a number of specific 
examples of Euler allocations corresponding to different choices of risk measure ọ. 


Standard deviation and the covariance principle. Consider the risk-measure func- 
tion rsp(A) = /var(L(A)) and write X for the covariance matrix of (L1, ..., La). 
Then we have rsp(A) = (A’A)!/2, from which it follows that 


ACE = drsp = (1); D Sy cov(L;, Lj) _ cov(L;, L) 
pO hi ~ rsp) rsp(1) — /var(Ly © 


This formula is known as the covariance principle. If we consider more generally 
a risk measure of the form o(L) = E(L) +«SD(L) for some «x > 0, we get 
ro (A) = A’ E(L) + krsp(A) and hence 
cov(L;, L) 

/var(L) © 
VaR and VaR contributions. Suppose that rf,p (A) = qa (L(A)). In this case it can 
be shown that, subject to technical conditions, 


AC? = E(Li) +k 


are. 
AC? = Ra) = E(Li | L = qa(L)), 1<i<d. (8.62) 
l 


The derivation of (8.62) is more involved than that of the covariance principle, and 
we give a justification following Tasche (2000) under the simplifying assumption 
that the loss distribution of (L1, ..., La) has a joint density f. In the following 
lemma we denote by ¢ (u, l2, ..., la) = fLilL2,...,La (U | l2, .- - , la) the conditional 
density of Lı. 


vady 


Lemma 8.42. Assume that d > 2 and that (L1, ..., Lq) has a joint density. Then, 
for any vector (A, ..., àq) of portfolio weights such that à 4 0, we find that 


(i) L(A) has density 


d 
frat) = pare (6(a'( — a] Disses ta) ) 
j=2 


and 
Gi) fori =2,...,d, 
E(Lib (Ay E — D4_y AjLj), La, ..., La)) 


E(Li | LA) = t) = , as. 
EATE- ES AjLj), La, ..., La)) 
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Proof. For (i) consider the case à; > 0 and observe that we can write 


P(LQA) < t) = E(P (LA) < t | L2,..., La)) 


d 
= e(e(r < aN -— Luts) | L2, eats) 
j=2 


ay Sof AJL) 
=e( f $C, Las- La) du). 


The assertion follows by differentiating under the expectation. 


For (ii) observe that we can write 


ôT! E(LiI 0/atyE(Lil 
A= nae (Lilt <ta)<t+s}) Z (0/dt)E(Liltray<ty) 
50 5-! P(t < L(A) <t+6) few) 


provided f7)(t) Æ 0. The result follows by applying a similar conditioning tech- 
nique to the ones used in the proof of (i) to the numerator. 


We now explain why (8.62) follows from Lemma 8.42. Since the rv L(A) has a 
density, we have P(L(A) < ga(L(A))) = a. Writing k(t) = ari (t — Vi AyL yj), 
we have 
k (Wap (A)) 


a = P(LQ) < r&RQ)) = e( f 


—00 


dtu, L2,..., La) au), (8.63) 


We take derivatives of (8.63) with respect to À; fori = 2,...,d to get 


Ir p (A 
0= a'e( (5e -— Li JOSRA), Doers La); 
i 
Solving this expression for 3rğ p (A) /0A;, using part (ii) of Lemma 8.42 and substi- 
tuting à = 1 yields (8.62), as desired. Analogous calculations can be done fori = 1 
and A, < 0. Tasche (2000) makes the derivations mathematically rigorous by using 
the implicit function theorem and giving all necessary conditions. 


Expected shortfall and shortfall contributions. Now consider using the risk- 
measure function rpg(A) = E(L | L > qg(L(A))) corresponding to expected 
shortfall. It follows from Definition 2.12 that we can write 


1 1 
res(A) = —/ rag (A) du, 
a 


where we make use of the notation r&r (à) = qa(L(à)) as above. We apply the 
Euler principle by again computing the derivative with respect to A;. Assuming the 
differentiability of r¥p (4), we have, with L = L(1), 


Orgs 1 1 ary, 1 1 
1) = Ray du=——— fui | b= L) du. 
Say =f Ray du =f Bus |= aah au 
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Now we assume that the density fz of L is strictly positive so that the df of L has 
a differentiable inverse and we can make the change of variables v = qy(L) = 
F(u). Since dv/du = (fi.(v))~!, we get 

[0.0] 


Orgs 1 1 
(1) = —— E(L; | L = v) fL (v) dv = ——E(L;; L 2 qa(L))). 
0A; 1 =O. da(L) 1 — Q 


Hence the Euler capital allocation takes the form 
AC? = E(Li | L > VaRa(L)), L:= L(I, (8.64) 


where AC? is known as the expected shortfall contribution of investment possibility 
(or line of business) i. This is a popular allocation principle in practice, and is often 
considered to be preferable to the covariance principle and the principle based on 
VaR contributions. See Notes and Comments for literature on its use in practice in 
the context of credit portfolios. 


Euler allocation for elliptical loss distributions. In the following corollary to The- 
orem 8.28 we consider the special case of an elliptical loss distribution for the 
vector (L1,..., La). We consider this distribution to be centred at zero so that it 
really represents fluctuations of the loss around its mean; centring (L),..., La) is 
of course equivalent to working with the mean-adjusted version of some translation- 
invariant risk measure @. We find that the relative amounts of capital allocated to 
each investment opportunity are always the same, regardless of whether we base an 
Euler allocation on the standard deviation, VaR or expected shortfall risk measures, 
or indeed any positive-homogeneous risk measure. Allocation is therefore very sim- 
ple in this case: depending on our choice of risk measure we calculate the total risk 
capital to be allocated and then use a simple partitioning formula given in (8.65) 
below. 


Corollary 8.43. Assume thatr,: A — R is the risk-measure function of a positive- 
homogeneous and law-invariant risk measure 9. Let L ~ Eq(0, X, Y). Then, under 
an Euler allocation, the relative capital allocation is given by 
AC? dX 
i= =i Se Tee (8.65) 
AC} eat Xjk 


Proof. From the proof of Theorem 8.28 we deduce that, by the positive homogeneity 
of the risk measure, we have 


d 
roA) = o(LA)) = o( ZuL) = VN X0 (Y1), 


i=l 
where Yj is the first component of a spherical random vector with characteristic 
generator w. For the Euler allocation we get 


a aS 
Yo (pe kai Xik 


AC? = 
' ði VY X1 


o(%1), 


from which the result follows. 
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8.5.3 Economic Properties of the Euler Principle 


In this section we show that the Euler principle has a number of good economic 
properties. As in the previous section we consider a positive-homogeneous risk 
measure ọ and we assume that the corresponding risk-measure function rg is con- 
tinuously differentiable in R? \ {0} (a positive-homogeneous function is typically 
not differentiable in A = 0). By AC? = 0rg(1)/0A; we then denote the associated 
risk contributions under the Euler principle. 


Compatibility with a RORAC approach. We define the RORAC of the overall loss 
by RORAC(L) := E(—L)/p(ZL); the portfolio-related RORAC of investment unit 
i is defined as 

E(-Lj) 


RORAC(L; | L) := 
(Li | L) AC? 


’ 


where it is tacitly assumed that the denominator is strictly positive. The Euler princi- 
ple is then compatible with a RORAC approach in the following sense: if investment 
opportunity 7 performs better than the overall portfolio L in the RORAC metric, then 
the RORAC of the overall portfolio is increased if one increases slightly the weight 
of unit i. The Euler principle therefore gives correct signals for investment decisions. 
In mathematical terms, RORAC compatibility means that there is some £ > 0 such 
that for all O < h < e it holds that 


(RORAC(L; | L) > RORAC(L)) > (RORAC(L+hL;) > RORAC(L)). (8.66) 


In order to establish (8.66) it suffices to show that RORAC(L; | L) > RORAC(L) 
implies that (d/dh) RORAC(L + AL;)|n=0 > 0. Denote by e; the ith unit vector. 
Then it holds that 

d E(—(L + hLi)) 
n=o dh rol + hei) 


d 
— RORAC(L + AL; 
qy RORACC + ALi) 


h=0 


1 
= (Ecore — E(-L) DA; 


Yo (1)? 
Recall that @(L) = rg(1) and that AC? = 0rg(1)/dA;. Hence the last expression is 
strictly positive if E(—L;)/AC? > E(—L)/o(L), as claimed. 
In fact, it can be shown that for a positive-homogeneous ọ the Euler principle is 
the only capital allocation principle that satisfies the RORAC compatibility (8.66) 
(see Tasche (1999) for details). 


ea) 


Diversification benefit. Suppose that the risk measure ọ is positive homogeneous 
and subadditive, as is the case for a coherent risk measure or a mean-adjusted version 
thereof. In that case, since 9(L) < any o(L;), the overall risk capital required for 
the portfolio is smaller than the sum of the risk capital required for the business units 
on a stand-alone basis. In practice, the difference yo o(L;) — e(L) is known as 
the diversification benefit. It is reasonable to require that each business unit profits 
from the diversification benefit in the sense that the individual risk contribution of 
unit 7 does not exceed the stand-alone capital charge o(L;) (otherwise there would 
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be an incentive for unit i to leave the firm, at least in theory). We now show that the 
Euler principle does indeed satisfy the inequality 


AC? < @(Li), 1<i<d. (8.67) 


The key is the following inequality: for a convex and positive-homogeneous function 
f: R? = R that is continuously differentiable in Rf \ {0}, it holds for all A, 4 with 


à Æ —) that 
d 


bp ae uu 


i=l 


JA) > (8.68) 
If we apply this inequality with f = rọ (which is convex as ọ is positive 
homogeneous and subadditive), A = e; and à = 1 — e;, we get the inequality 
To(ei) > Org(1)/dA; and hence (8.67). 

It remains to establish the inequality (8.68). Since f is convex it holds for all 
x, y € Rf, x Æ 0, that 


af (x 


d 
FI > F@)+ D101 — I 


i=1 
Moreover, by Euler’s rule we have f(x) = ena 1 xi0f (x)/dx; and hence 


d 


fy) > D» "n 


Substituting y = À, x = à + Î gives the result. 

The work of Kalkbrener (2005) and Denault (2001) takes this analysis one step 
further. In these papers it is shown that under suitable technical conditions the Euler 
principle is the only capital allocation principle that satisfies a slight strengthening of 
the diversification-benefit inequality (8.67). Obviously, this gives additional support 
for using the Euler principle if one works in the realm of coherent risk measures. 
From a practical point of view, the use of expected shortfall and expected shortfall 
contributions might be a reasonable choice in many application areas, particularly 
for credit risk management and loan pricing (see Notes and Comments, where this 
issue is discussed further). 


Notes and Comments 


A broad, non-technical discussion of capital allocation and performance measure- 
ment is to be found in Matten (2000) (see also Klaassen and van Eeghen 2009). The 
term “Euler principle” seems to have first been used in Patrik, Bernegger and Rüegg 
(1999). The result (8.62) is found in Gouriéroux, Laurent and Scaillet (2000) and 
Tasche (2000); the former paper assumes that the losses have a joint density and 
the latter gives a slightly more general result as well as technical details concerning 
the differentiability of the VaR and ES risk measures with respect to the portfolio 
composition. Differentiability of the coherent premium principle of Section 2.3.5 is 
discussed in Fischer (2003). The derivation of allocation principles from properties 
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of risk measures is also to be found in Goovaerts, Dhaene and Kaas (2003) and 
Goovaerts, van den Borre and Laeven (2005). 

For the arguments concerning suitability of risk measures for performance mea- 
surement, see Tasche (1999) and Tasche (2008). An axiomatic approach to capital 
allocation is found in Kalkbrener (2005) and Denault (2001). For an early contribu- 
tion on game theory applied to cost allocation in an insurance context, see Lemaire 
(1984). 

Applications to credit risk are found in Kalkbrener, Lotter and Overbeck (2004) 
and Merino and Nyfeler (2004); these make strong arguments in favour of the 
use of expected shortfall contributions. On the other hand, Pfeifer (2004) contains 
some compelling examples to show that expected shortfall as a risk measure and 
expected shortfall contributions as an allocation method may have some serious 
deficiencies when used in non-life insurance. The existence of rare, extreme events 
may lead to absurd capital allocations when based on expected shortfall. The reader 
is therefore urged to reflect carefully before settling on a specific risk measure and 
allocation principle. It may also be questionable to base a “coherent” risk-sensitive 
capital allocation on formal criteria only; for further details on this from a non-life 
insurance perspective see Koryciorz (2004). 

Risk-adjusted performance measures are widely used in industry in the context 
of capital budgeting and performance measurement. A good overview of current 
practice is given in Chapter 14 of Crouhy, Galai and Mark (2001) (see also Klaassen 
and van Eeghen 2009). An analysis of risk management and capital budgeting for 
financial institutions from an economic viewpoint is given in Froot and Stein (1998). 


Part III 


Applications 


9 


Market Risk 


In this chapter we look at methods for measuring the market risk in portfolios 
of traded instruments. We emphasize the use of statistical models and techniques 
introduced in Part II of the book. While we draw on material from most of the 
foregoing chapters, essential prerequisites are Chapter 2, in which the basic risk 
measurement problem was introduced, and Chapter 4 on financial time series. The 
material is divided into three sections. 

In Section 9.1 we revisit the topic of risk factors and mappings, first described 
in very general terms in Section 2.2. We develop the modelling framework in more 
detail in this chapter for the specific problem of modelling market risk in a bank’s 
trading book, where derivative positions are common and the regulator requires risk 
to be measured over short time horizons such as one day or two trading weeks. 

Section 9.2 is devoted to the topic of market-risk measurement. Assuming that 
the portfolio has been mapped to risk factors, we describe the various statistical 
approaches that are used in industry to estimate loss distributions and risk mea- 
sures like VaR or expected shortfall. These methods include the variance—covariance 
(delta-normal), historical simulation and Monte Carlo methods. 

The subject of backtesting the performance of such methods is treated in Sec- 
tion 9.3. We describe commonly used model-validation procedures based on VaR 
violations as well as more recent proposals for comparing methods using scoring 
functions based on elicitability theory. 


9.1 Risk Factors and Mapping 


The key idea in this section is that of a loss operator, which is introduced in Sec- 
tion 9.1.1. This is a function that relates portfolios losses to changes in the risk 
factors and is effectively the function that a bank must evaluate in order to deter- 
mine the P&L of its trading book under scenarios for future risk-factor changes. 
Since the time to maturity or expiry has an impact on the value of many market 
instruments, we consider the issue of different timescales for risk measurement and 
valuation in detail. In Section 9.1.2 we show how the typically non-linear loss oper- 
ator can be approximated over short time intervals by linear (delta) and quadratic 
(delta—gamma) functions. 

The methodology is applied to a portfolio of zero-coupon bonds in Section 9.1.3, 
and it is shown that the linear and quadratic approximations to the loss operator have 
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interpretations in terms of the classical bond pricing concepts of duration and con- 
vexity. Since the mapping of fixed-income portfolios is typically a high-dimensional 
problem, we consider factor modelling strategies for reducing the complexity of the 
mapping exercise in Section 9.1.4. 


9.1.1 The Loss Operator 


Consider a portfolio of assets subject to market risk, such as a collection of stocks and 
bonds or a book of derivatives. The value of the portfolio is given by the continuous- 
time stochastic process (V (t));eR, where it is assumed that V (t) is known at time t; 
this means that the instruments in the portfolio can either be marked-to-market or 
marked to an appropriate model (see Section 2.2.2 for discussion of these concepts). 

For a given time horizon At, such as one or ten days in a typical market-risk 
application, the P&L of the portfolio over the period [t, t + Af] is given by V(t + 
At)— V (t). We find it convenient to consider the negative P&L — (V (t+ At) — V (t)) 
and to represent the risk by the right tail of this quantity, which we refer to simply 
as the loss. It is assumed that the portfolio composition remains constant over this 
period and that there is no intermediate income or fees (the so-called clean or no- 
action P&L). 

In transforming the problem of analysing the loss distribution to a problem in 
financial time-series analysis, it is convenient to measure time in units of At and 
to introduce appropriate time-series notation. In a number of places in this chapter 
we move from a generic continuous-time process Y (t) to the time series (Y;);<z by 
setting 

Y, := Y(t), T = t(At). (9.1) 


Using this notation the loss is written as 
Lisi = —(V(tr41) — V(tr)) = —Vig1 — Vo). (9.2) 


In market-risk management we often work with valuation models (such as Black— 
Scholes) where calendar time is measured in years and interest rates and volatilities 
are quoted on an annualized basis. In this case, if we are interested in daily losses, 
we set At = 1/365 or At © 1/250; the latter convention is mainly used in markets 
for equity derivatives since there are approximately 250 trading days per year. The 
rvs V; and V,+; then represent the portfolio value on days t and t + 1, respectively, 
and L;4 is the loss from day ¢ to day t + 1. 

As explained in Section 2.2.1 the value V; is modelled as a function of time and a 
d-dimensional random vector Z; = (Z;.1,..., Zt a of risk factors. This procedure 
is referred to as mapping. Using the canonical units of time for the valuation model 
(typically years), mapping leads to an equation of the form 


Vi = g(t, Zr) (9.3) 


for some measurable function g: Ry x R — R and some vector of appropriate 
risk factors Z;. The choice of the function g and risk factors Z; reflects the structure 
of the portfolio and also the desired level of precision in the modelling of risk. 
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Note that by introducing f(t, Z;) := g(t, Z;) we can get the simpler version 
of the mapping formula used in (2.2). However, the use of the g(t,, Z+) notation 
allows us more flexibility to map positions while preserving market conventions 
with respect to timescale; see, for example, the mapping of a zero-coupon bond 
portfolio in Section 9.1.3. 

Recall that the risk-factor changes (X;);<z are given by X; := Z; — Z;— 1. Using 
the mapping (9.3) the portfolio loss can be written as 


Lis = —(e(tr41, Zt + X41) — g(t, Zr)). (9.4) 


Since Z; is known at time f¢, the loss distribution at time f is determined by the 
distribution of the risk-factor change X;+1. We therefore introduce a new piece of 
notation in this chapter, namely the loss operator at time t, written ly: R? > R, 
which maps risk-factor changes into losses. It is defined by 


lx) = = (8 (T41, Zt +x) — g(t 2%), x ER, (9.5) 


where z; denotes the realized value of Z, at time t, we obviously have L;4, = 
lrj(X141) at time t. The loss operator will facilitate our discussion of statistical 
approaches to measuring market risk in Section 9.2. 

Note that, while we use lowercase z; in (9.5) to emphasize that the loss operator 
is a function of known risk-factor values at time t, we will not apply this convention 
strictly in later examples. 


9.1.2 Delta and Delta-Gamma Approximations 


If the mapping function g is differentiable and At is relatively small, we can approx- 
imate g with a first-order Taylor series approximation 
d 
g(t + At, zi +X) X G(T) + 8r(T ATH D> ga (To Zr), (9.6) 
i=1 

where the t subscript denotes the partial derivative with respect to the time argument 
of the mapping and the z; subscripts denote the partial derivatives with respect to 
the risk factors. This allows us to approximate the loss operator in (9.5) by the linear 
loss operator at time t, which is given by 


d 
I(x) = («. (tt, ze) At + Y Bo, (te, ens); (9.7) 
i=l 
Note that, when working with a short time horizon Ar, the term g;(t;, Z;) At is very 
small and is sometimes omitted in practice. 
We can also develop a second-order Taylor series, or so-called delta—gamma, 
approximation. Suppose we introduce vector notation 


b(t, 2) = (gz, (Tr, Zt), +++ Eza (tr, ze) 


for the first-order partial derivatives of the mapping with respect to the risk factors. 
For the second-order partial derivatives let 


(Tr, Zr) = (8zi (Tr, Zt), SEES Scat (Tt, z) 
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denote the vector of mixed partial derivatives with respect to time and the risk factors 
and let I (tz, zt) denote the matrix with (i, 7)th element given by g-,- j (Tt, Zt); this 
matrix contains gamma sensitivities to individual risk factors on the diagonal and 
cross gamma sensitivities to pairs of risk factors off the diagonal. The full second- 
order approximation of g is 


Elti + At, Zt +X) & g(t, Zt) + grlt, Zt) At + ôlCT, z1) x 


+ 5 (Bre (Tr, Z1)(At)? + 20t, zi) x At + x'T (tr, Z1)X). 
(9.8) 


In practice, we would usually omit terms of order o(At) (terms that tend to zero 
faster than At). In the above expression this is the term in (At)? and, if we assume 
that risk factors follow a standard continuous-time financial model such as Black— 
Scholes or many generalizations thereof, the term in x At. 

To understand better why the last statement is true, consider the case of the Black— 
Scholes model. The log stock price at time ¢ is given by In S$; = (u— 50° )t +oW,,, 
where u is the drift, ø is the volatility, ty = t (At) as usual and W,, denotes Brownian 
motion. It follows that the risk-factor change satisfies 

St4t 
X41 =In 
ma =n (3 


t 


) ~ N((u — 40°)At, 0° At). 


Clearly, X;+1/(0o VAt) converges in distribution to a standard normal variable as 
At — 0. Risk-factor changes x in this model are therefore of order O(/At), 
meaning they tend to zero at the same rate as Af. It follows that the term x At 
tends to zero at the same rate as (Ar)?/2 and is therefore a term of order o(At). 
Omitting terms of order o(At) in (9.8) leaves us with the quadratic loss operator 


WN (x) = — (81 (tr, Z) At + 8 (ty, zi) x + ix T (ti, 1) X), (9.9) 


and this typically provides a more accurate approximation to (9.5) than the linear 
loss operator. In Example 9.1 below we give an application of the delta-gamma 
approximation (9.9). 


Example 9.1 (European call option). The set-up and notation in this example are 
similar to those of Example 2.2 but we now consider a European call option that 
has been sold by a bank and delta-hedged to remove some of the risk. This means 
that the bank has bought a quantity of stock equivalent to the delta of the option so 
that the first-order sensitivity of the hedged position to stock price changes is 0. To 
simplify the analysis of risk factors, we assume that the interest rate r is constant. 

Using the time-series notation in (9.1), the value of the hedged position at time t 
is 

V; = Sihi — C(t, Si; r, or, K, T), (9.10) 

where S, and o; are the stock price and implied volatility at t, K is the strike price, 
T is the maturity and h; = ce (Ttr, St; rt, Or, K, T) is the delta of the option. The 
time horizon of interest is one day and the natural time unit in the Black-Scholes 
formula is years, so At = 1/250 and t = t /250. 
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The valuation formula (9.10) is of the form (9.3) with risk factors Z; = (In S;, 0)’. 
The linear loss operator (9.7) is given by 


I(x) = cP ay + CP a5 


since gz, (Tr, Z+) = (hy — CBS)S, = 0. 

Consider the situation where the time to expiry is T — t; = 1, the strike price is 
100 and the interest rate is r = 0.02. Moreover, assume that the current stock price 
is S; = 110, so that the option is in the money, and the current implied volatility is 
or = 0.2. The values of the Greeks in the Black-Scholes model may be calculated 
using well-known formulas (see Notes and Comments): they are GPS x —4.83 
and ce ~ 34.91. Suppose we consider the effect of risk-factor changes x = 
(0.05, 0.02)’ representing a stock return of (approximately) 5% and an increase in 
implied volatility of 2%. The stock return obviously makes no contribution to the 
linearized loss, which is given by 


I(x) = CBS - (1/250) + CPS . 0.02 ~ —0.019 + 0.698 = 0.679. 


On the other hand, if we use full revaluation of the option at time t + 1, the loss 
would be given by lin (x) ~ 0.812. So, for risk-factor changes of this order of x, 
there is a 16% underestimate involved in linearization. 

To make a second-order approximation in this case we need to compute gamma, 
the second derivative C ve with respect to stock price, the second derivative CBS 
with respect to volatility, and the mixed derivative C BS with respect to stock price 
and volatility. This gives the quadratic loss operator 


IAF (x) = CPS Ar + CBSxp + LCS 32x? + CBS Sitia + LC332, 


where we note that the S? and S, factors enter the third and fourth terms because 
the risk factor is In S, rather than S;. In the numerical example, 


UAT (x) = UA (x) + ECBS S2x? + CBS Sixx + 4CPSx3 
~ 0.679 + 0.218 — 0.083 + 0.011 = 0.825. 


This is less than a 2% overestimate of the true loss, which is a substantially more 
accurate assessment of the impact of x. The inclusion of the gamma of the option 
C BS is particularly important. 

This example shows that the additional complexity of second-order approxima- 
tions may often be warranted. Note, however, that delta-gamma approximations can 
give very poor results when applied to longer time horizons with large risk-factor 
changes. 


9.1.3 Mapping Bond Portfolios 


In this section we apply the ideas of Section 9.1.2 to the mapping of a portfolio of 
bonds and relate this to the classical concepts of duration and convexity in the risk 
management of bond portfolios. 


330 9. Market Risk 


Basic definitions for bond pricing. In standard bond pricing notation, p(t, T) 
denotes the price at time ¢ of a default-free zero-coupon bond with maturity T. 
While zero-coupon bonds of long maturities are relatively rare in practice, many 
other fixed-income instruments such as coupon bonds or standard swaps can be 
viewed as portfolios of zero-coupon bonds, and zero-coupon bonds are therefore 
fundamental building blocks for studying interest-rate risk. We follow a standard 
convention in modern interest-rate theory and normalize the face value p(T, T) of 
the bond to 1, and we measure time in years. 

The mapping T — p(t, T) for different maturities is one way of describing the 
so-called term structure of interest rates at time t. An alternative description is based 
on yields. The continuously compounded yield of a zero-coupon bond is defined to 
be y(t, T) = —(1/(T — t)) In p(t, T), so that we have the relationship 


p(t, T) = exp(—(T — t)y(t, T)). 


The mapping T +> y(t, T) is referred to as the continuously compounded yield 
curve at time t. Yields are a popular way of describing the term structure because 
they are comparable across different times to maturity due to the rescaling by (T —t); 
they are generally expressed on an annualized basis. 

We now consider the mapping of a portfolio of zero-coupon bonds. Note that the 
same mapping structure would be obtained for a single coupon bond, a portfolio of 
coupon bonds or any portfolio of promised cash flows at fixed future times. 


Detailed mapping of a bond portfolio. We consider a portfolio of d default-free 
zero-coupon bonds with maturities T; and prices p(t, T;), 1 < i < d. By Aj we 
denote the number of bonds with maturity 7; in the portfolio. 

In a detailed analysis of the change in value of the bond portfolio, one takes all 
yields y(t, 7;), 1 < i < d, as risk factors. The value of the portfolio at time t is 
given by 

d d 
Vi) = J upt, Ti) = Yo Ai exp (T; — Dy, Ti). (9.11) 
i=l i=l 
Switching to a discrete-time set-up using (9.1), the mapping (9.3) of the bond port- 
folio can be written as 
d 
Vi = g(t, Zi) = X hi exp (T; — t) Zi), (9.12) 
i=l 
where t; = t(At), At is the time horizon expressed in years, and the risk factors 
are the yields Z; ; = y(t, Ti), 1 < i < d. The risk-factor changes are the changes 
in yields X;41,; = y(t41, Ti) — y(t, Ti), 1 <i <d. 

From (9.12) the loss operator lj and its linear and quadratic approximations can 

easily be computed. The first derivatives of the mapping function are 
d 
Br (Tr; 21) = XO Mi PCr Tiri, 
i=1 


&zi (Tr, Zt) = —Ai(Tj — t) exp(- (T; — Tt)Zt i). 
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Inserting these into (9.7) and reverting to standard bond pricing notation we obtain 


d 
AO = YO upt, HOG. T)At — (T; — tx), (9.13) 
i=l 
where x; represents the change in yield of the ith bond. 
For the second-order approximation we need the second derivatives with respect 
to yields, which are 


zizi (Tr. Zt) = Ài (T; — u)? exp(—(T; — ti)zr,i) 


and gz;z; (Tr, zt) = Ofori # j. Using standard bond pricing notation, the quadratic 
loss operator in (9.9) is 


d 
ln O = YO up, HOG, At — (T; — xj +AT; — x7). 9.14) 
i=l 
Relationship to duration and convexity. The approximations (9.13) and (9.14) can 
be interpreted in terms of the classical notions of the duration and convexity of bond 
portfolios. To make this connection consider a very simple model for the yield curve 
at time ¢ in which 
YT Ti) = yr, Ti) + x (9.15) 


for all maturities T;. In this model we assume that a parallel shift in level takes place 
along the entire yield curve, an assumption that is unrealistic but that is frequently 
made in practice. 

Obviously, when (9.15) holds, the loss operators in (9.13) and (9.14) are functions 
of a scalar variable x (the size of the shift). We can express (9.13) in terms of the 
classical concept of the duration of a bond portfolio by writing 


IAE) = —Vi (A At — D;x), (9.16) 


where 


d d 
D= HeT o =) A=), HPT y(n, Ti). 
i=l i=l 
The term that interests us here is D;, which is usually called the (Macaulay) duration 
of the bond portfolio. It is a weighted sum of the times to maturity of the different 
cash flows in the portfolio, the weights being proportional to the discounted values 
of the cash flows. 

Over short time intervals the At term in (9.16) will be negligible and losses of 
value in the bond portfolio will be determined by /[;](x) ~ vu; Dix, so that increases 
in the level of the yield curve lead to losses and decreases lead to gains (assuming all 
positions are long so that A; > 0 for alli). The duration D; can be thought of as the 
bond pricing analogue of the delta of an option; to a first-order approximation, losses 
will be governed by D,. Any two bond portfolios with equal value and duration will 
be subject to similar losses when there is a small parallel shift of the yield curve, 
regardless of differences in the exact composition of the portfolios. 


332 9. Market Risk 


Duration is an important tool in traditional bond-portfolio or asset-liability man- 
agement. The standard duration-based strategy to manage the interest-rate risk of a 
bond portfolio is called immunization. Under this strategy an asset manager, who 
has a certain amount of funds to invest in various bonds and who needs to make 
certain known payments in the future, allocates these funds to various bonds in such 
a way that the duration of the overall portfolio consisting of bond investments and 
liabilities is equal to zero. As we have just seen, duration measures the sensitivity 
of the portfolio value with respect to shifts in the level of the yield curve. A zero 
duration therefore means that the position has been immunized against changes in 
level. However, the portfolio is still exposed to other types of yield-curve changes, 
such as changes in slope and curvature. 

It is possible to get more accurate approximations for the loss in a bond portfolio 
by considering second-order effects. The analogue of the gamma of an option is the 
concept of convexity. Under our model (9.15) for changes in the level of yields, the 
expression for the quadratic loss operator in (9.14) becomes 


UN Œ) = -V,(A, At — Dix + $C,x°), (9.17) 
where 


d 
hi P(t, Ti) 
Cy = > aan ae a ay 
i=l 


is the convexity of the bond portfolio. The convexity is a weighted average of the 
squared times to maturity and is the negative of the derivative of the duration with 
respect to yield. Consider two portfolios (1) and (2) with identical values V, and 
durations D;. Assume that the convexity of portfolio (1) is greater than that of 
portfolio (2), so that Cc) > Co. Ignoring terms in Af, the difference in loss 
operators satisfies 


Ara Ar (2 1 2 

Ow a Ow © AvE -CP <0. 
In other words, an increase in the level of yields will lead to smaller losses for 
portfolio (1), and a decrease in the level of yields will lead to larger gains (since 
-i (x) > lin P (x)). For this reason portfolio managers often take steps to 
construct portfolios with relatively high convexity. Roughly speaking, this is done 


by spreading out the cash flows as much as possible (see Notes and Comments). 


9.1.4 Factor Models for Bond Portfolios 


For large portfolios of fixed-income instruments, such as the overall fixed-income 
position of a major bank, modelling changes in the yield for every cash flow maturity 
date becomes impractical. Moreover, the statistical task of estimating a distribution 
for X;+, is difficult because the yields are highly dependent for different times to 
maturity. A pragmatic approach is therefore to build a factor model for yields that 
captures the main features of the evolution of the yield curve. Three-factor models 
of the yield curve in which the factors typically represent level, slope and curvature 
are often used in practice. 
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In this section we describe two different approaches to approximating the loss 
operator for a bond portfolio, and we show how statistical analysis techniques from 
the area of factor modelling are used to calibrate the approximating functions. 


The approach based on the Nelson and Siegel (1987) model. The Nelson—Siegel 
model is usually formulated in terms of instantaneous forward interest rates, which 
are defined from prices by 


a 
ft, T) =—s In ptt, T). 


These can be interpreted as representing the rates that are offered at time t for 
borrowing at future times T. Yields are related to forward rates by 


1 T 
y(t, T) = "5 f(t, u) du. (9.18) 
T-tJ; 
In the Nelson—Siegel approach the forward curve is modelled by 


ft, T) = Zia + Zi exp(—n (T — %)) + 213m (T — t) exp(—m (T — t)), 


where the factors are (Z; 1, Z;,2, Z;,3) and n; is a positive rate parameter, which is 
chosen to give the best fit to forward rate data. The relationship (9.18) between the 
forward and yield curves implies that 


Y(t, T) = Zia +k (TF — tt, 1) Zt 2 + ka(T — t, nt) Z13, (9.19) 


where the functions kı and k2 are given by 


1] — e7”s M 
kı (s, Neca ko(s,n) = ki (s, n) — e ™. 


These functions are illustrated in Figure 9.1. We now give an economic interpretation 
of the factors. 

Clearly, lims—o kı (s, n) = liMms—o k2 (s, n) = 0, while lims—o kı (s, n) = 1 
and lims-,9 k2(s, n) = 0. It follows that limr_... y(t, T) = Z; 1, so that the 
first factor is usually interpreted as a long-term level factor. Z, 2 is interpreted as 
a slope factor because the difference between short-term yield and long-term yield 
satisfies limr_,7, y(t, T) — limr-soo y(t1, T) = Z;,2; Z;,3 has an interpretation as 
a curvature factor. 

Using the factor model (9.19), the mapping (9.11) for the bond portfolio becomes 


d 


Vi = g(t, Zi) = È hi exp (T; — 11k; Zr). 
i=l 


where ksi = (1, ki (T; — tt, nt), ka (Ti — tr, nr))’. It is then straightforward to derive 
the loss operator ljn (x) or its linear version lin (x), which, in contrast to (9.13), are 
functions on R? rather than R? (d is the number of bonds in the portfolio). 

To use this method to evaluate the linear loss operator at time f, in practice we 
require realized values z; for the risk factors Z;. However, we have to overcome the 
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Figure 9.1. The Nelson—Siegel functions kı (s, 7) and k2(s, n) 
for an 7 value of 0.623 (see also Example 9.2). 


fact that the Nelson—Siegel factors Z; are not directly observed at time t. Instead 
they have to be estimated from observable yield curve data. 
Let us suppose that at time t we have the data vector 


Y, = (y(t, Te +51), <- <, V(t, T + 5m))’ 


giving the yields for m different times to maturity, 51, ..., Sm, where m is large. This 
is assumed to follow the factor model Y, = B;Z; + er, where B, € R”™*? is the 
matrix with ith row consisting of (1, kı (si, nt), k2(si, nt)) and e; € R” is an error 
vector. This model fits into the framework of the general factor model in (6.50). 

For a given value of n; the estimation of Z; can be carried out as a cross-sectional 
regression using weighted least squares. To estimate n;, a more complicated opti- 
mization is carried out (see Notes and Comments). We now show how the method 
works for real market yield data. 


Example 9.2 (Nelson-Siegel factor model of yield curve). The data are daily 
Canadian zero-coupon bond yields for 120 different quarterly maturities ranging 
from 0.25 years to 30 years. They have been generated using pricing data for Gov- 
ernment of Canada bonds and treasury bills. We model the yield curve on 8 August 
2011. The estimated values are zı = 3.82, 712 = —2.75, z3 = —5.22 and 
fı = 0.623. The curves kı (s, 7) and k2(s, n) are therefore as shown in Figure 9.1. 


Example 9.2 illustrates the estimation of the Nelson—Siegel factor model at a 
single time point t. We note that, to make statistical inferences about bond-portfolio 
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Figure 9.2. Canadian yield curve data (points) and fitted Nelson—Siegel curve 
for 8 August 2011 (see Example 9.2 for further details). 


losses based on historical data, we would typically construct historical risk-factor 
time series (Z,,) from yield data (Y,,) by estimating cross-sectional regression mod- 
els at all times u in some set {t— n + 1, ... , t}. The estimated risk-factor time series 
(Z,,) and the corresponding risk-factor changes form the data for the statistical 
methods that are the subject of Section 9.2. 


The approach based on principal component analysis. Another approach to build- 
ing a factor model of the term structure involves the use of principal component 
analysis (PCA). We refer to Section 6.4 for an introduction to this method. The key 
difference to the Nelson—Siegel approach is that here the dimension reduction via 
factor modelling is applied at the level of the changes in yields rather than the yields 
themselves. 

Let R,+ 1 denote the vector of yield changes for the bonds in the portfolio, so that 
Rigi = Vrs, Ti) — yu, Ti), 1 < i < d. We recall from (6.62) that PCA can 
be used to construct approximate factor models of the form 


Ri+1 = M + Xe + Eri, (9.20) 


where ¥;+1 is a p-dimensional vector of principal components (p « d), T} € R@*? 
contains the corresponding loading matrix, m is the mean vector of R;+1 and €;+1 
is an error vector. The columns of the matrix I consist of the first p eigenvectors 
(ordered by decreasing eigenvalue) of the covariance matrix of R;+1. The principal 
components X;+, will form the risk-factor changes in our portfolio analysis, hence 
our choice of notation. 

Typically, the error term is neglected and u ~ 0, so that we make the approxima- 
tion R;+1 ~% I X;41. In the case of the linear loss operator for the bond portfolio 
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in (9.13), we use the approximation 


d 
IA) = — Sai. HOG, At — (T; — u) (x)i), (9.21) 
i=1 
so that a function of a p-dimensional argument x is substituted for a function of a 
d-dimensional argument. 
To work with this function we require an estimate for the matrix [. This can 
be obtained from historical time-series data on yield changes by estimating sample 
principal components, as explained in Example 9.3. 


Example 9.3 (PCA factor model of yield changes). We again analyse Canadian 
bond yield data as in Example 9.2. To estimate the T matrix of principal component 
loadings we require longitudinal (time-series) data rather than the cross-sectional 
data that were used in the previous example. 

We will assume for simplicity that the times to maturity Ti — Tr, ..., T4 — Tr 
of the bonds in the portfolio correspond exactly to the times to maturity s1, ..., Sd 
available in the historical data set (if not we would make an appropriate selection 
of the data) and that the risk-management horizon At is one day. 

In the Canadian data set we have 2488 days of data spanning the period from 
2 January 2002 to 30 December 2011 (ten full trading years); recall that each day 
gives rise to a data vector Y, = (y(Ty, Tu +51), ---, Y(Tu, Tu +5a))’ of yields for the 
different maturities. In line with (9.20) we analyse the daily returns (first differences) 
of these data R, = Y, — Y,—1 using PCA under the assumption that they form a 
stationary time series. (Note that a small error is incurred by analysing daily yield 
changes for yields with fixed times to maturity rather than fixed maturity date, but 
this will be neglected for the purposes of illustration.) 

When we compute the variances of the sample principal components (using the 
same technique as for Figure 6.5), we find that the first component explains 87.0% 
of the variance of the data, the first two components explain 95.9%, and the first 
three components explain 97.5%. We choose to work with the first three principal 
components, meaning that we set p = 3. The matrix T} is estimated by G4, a matrix 
whose columns are the first three eigenvectors of the sample covariance matrix. We 
recall from (6.63) that the complete eigenvector matrix for the sample covariance 
matrix is denoted by G. 

The first three eigenvectors are shown graphically in Figure 9.3 and lend them- 
selves to a standard interpretation. The first principal component has negative load- 
ings for all maturities; the second has negative loadings up to ten years and positive 
loadings thereafter; the third has positive loadings for very short maturities (less 
than 2.5 years) and very long maturities (greater than 15 years) but negative load- 
ings otherwise. This suggests that the first principal component can be thought of 
as inducing a change in the level of all yields, the second induces a change of slope 
and the third induces a change in the curvature of the yield curve. 


We note that to make statistical inferences about bond-portfolio losses based on 
historical data we require a historical time series of risk-factor changes (X „) for times 
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Figure 9.3. First three principal component loading vectors plotted against time to maturity. 
The data are daily changes in yield for Canadian zero-coupon bonds in the ten-year period 
2002-11. The horizontal line at zero shows when loadings are positive or negative. See 
Example 9.3 for further details. 


u in some set {t —n + 1, ... , t}. These data are not directly observed but are instead 
extracted from the time series of sample principal components (G7! (R, — R)), 
where R is the sample mean vector. The risk-factor change data (X,,) are taken to 
be the first p component series; these form the data for the statistical methods that 
are the subject of Section 9.2. 


Notes and Comments 


The mapping framework introduced in this section is similar to the approach pio- 
neered by the RiskMetrics Group (see the RiskMetrics Technical Document (JPMor- 
gan 1996) and Mina and Xiao (2001)). The mapping of positions is also discussed 
in Dowd (1998), Jorion (2007) and in Volume III of the Market Risk Analysis series: 
Alexander (2009). The latter series of four volumes is relevant to much of the mate- 
rial of this chapter. 

The use of first- and second-order approximations to the portfolio value (the so- 
called delta-gamma approximation) may be found in Duffie and Pan (1997) and 
Rouvinez (1997) (see also Duffie and Pan 2001). Formulas for the Greeks in the 
Black-Scholes model may be found in a number of textbooks, including Haug 
(1998) and Wilmot (2000). Leoni (2014) is a very readable introduction. 

Many standard finance textbooks treat interest-rate risk and bonds. For a detailed 
discussion of duration and its use in the management of interest-rate risk, the books 
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by Jarrow and Turnbull (1999) and Hull (1997) are good starting points. The con- 
struction of fixed-income portfolios with higher convexity (so-called barbell port- 
folios) is discussed in Tuckman and Serrat (2011). 

More advanced mathematical textbooks on interest-rate modelling include Brigo 
and Mercurio (2006), Carmona and Tehranchi (2006) and Filipović (2009) 

There are many different approaches to modelling the yield curve. We have con- 
centrated on the parametric factor model proposed by Nelson and Siegel (1987) 
and further developed in Siegel and Nelson (1988) and Svensson (1994). One 
theoretical deficiency of the Nelson—Siegel model is that it is not consistent with 
no-arbitrage pricing theory (Filipović 1999). In more recent work, Christensen, 
Diebold and Rudebusch (2011) have proposed a model that approximates the 
Nelson—Siegel model but also fits into the class of three-factor, arbitrage-free affine 
models. 

Useful information about estimating Nelson—Siegel models in practice can be 
found in Ferstl and Hayden (2011); we have used their R package termstrc to 
carry out the analysis in Example 9.2. Diebold and Li (2006) have developed an 
approach to forecasting the yield curve in which they fit cross-sectional Nelson— 
Siegel factor models to multivariate time series of yields for different times to matu- 
rity and then use vector autoregression (VAR) models to forecast the Nelson—Siegel 
factors and hence the entire yield curve. They report satisfactory results when the 
value of 7 is held constant over all cross-sectional regressions. The alternative factor- 
model approach based on PCA is discussed in Hull (1997) and Alexander (2001) 
(see examples in Section 6.2 of the latter in particular). 

RiskMetrics takes a different approach to the problem of the dimension of bond 
portfolios. A few benchmark yields are taken for each country and a procedure is 
used to approximately map cash flows at days between benchmark points to the 
two nearest benchmark points; we refer to Section 6.2 of the RiskMetrics technical 
document (JPMorgan 1996) for details. 


9.2 Market Risk Measurement 


In this section we discuss methods used in the financial industry to estimate the loss 
distribution and associated risk measures for portfolios subject to market risk. In 
the formal framework of Section 9.1 this amounts to the problem of estimating the 
distribution of L+41 = l) (X:+1), or a linear or quadratic approximation thereof, 
where X;+1 is the vector of risk-factor changes from time ¢ to time t + 1 and līz] is 
the known loss operator function at time t. 

The problem comprises two tasks: on the one hand we have the statistical problem 
of estimating the distribution of X;+1; on the other hand we have the computational 
or numerical problem of evaluating the distribution of L;+1 = lyg (X;+1). To accom- 
plish the first task, we first have to consider carefully the nature of the distribution we 
wish to estimate, in particular, whether we focus on the conditional or unconditional 
distribution of risk-factor changes. 
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9.2.1 Conditional and Unconditional Loss Distributions 


Generally, in market-risk measurement, it is natural to compute conditional measures 
of risk based on the most recent available information about financial markets. In 
this case, the task is to estimate Fy,,,|¢,, the conditional distribution of risk-factor 
changes, given F;, the sigma algebra representing the available information at time t. 
In most cases, F; is given by F; = o ({Xs: s < t}), the sigma algebra generated 
by past and present risk-factor changes up to and including time t. The conditional 
loss distribution is the distribution of the loss operator lj (-) under F’y, , ,|g,, that is, 
the distribution with df Fz r (D = P (lta (X1) <1 | Ft). 

While the conditional approach is very natural in market-risk measurement, it 
may not always yield a prudent assessment of risk. If we are in the middle of a quiet 
period on financial markets, we may underestimate the possibility of extreme market 
losses, which can result in an overoptimistic view of a firm’s capital adequacy. For 
this reason it can be informative to compute unconditional loss distributions based 
on assumptions of stationary behaviour over longer time windows that (ideally) 
contain previous episodes of market volatility. 

In the unconditional approach we make the assumption that the process of risk- 
factor changes (Xs)s<s forms a stationary multivariate time series. We recall the 
definition of a stationary univariate time series from Section 4.1.1; the multivariate 
definition is given in Section 14.1. We estimate the stationary distribution Fy of the 
time series and then evaluate the unconditional loss distribution of lj (X), where 
X represents a generic random vector in R? with df Fy. The unconditional loss 
distribution is thus the distribution of the loss operator lj (-) under Fy. 

If the risk-factor changes form an independent and identically distributed (iid) 
series, we obviously have F'y,, ,|¢, = Fy, so that the conditional and unconditional 
approaches coincide. However, in Section 3.1.1 we have argued that many types 
of risk-factor-change data show volatility clustering, which is inconsistent with 
iid behaviour. In the stationary models that are used to account for such behaviour, 
Fy,,,|#, is not generally equal to the stationary distribution Fy. An important exam- 
ple is provided by the popular models from the GARCH family. In Section 4.2.1 
we observed that a simple stationary ARCH(1) model with a conditional normal 
distribution has a leptokurtic stationary distribution, i.e. a non-normal distribution 
with heavier tails. This is also true of more complicated univariate and multivariate 
GARCH models. 

Since the financial crisis of 2007-9, regulators have called for regular estimates 
of VaR to be supplemented by stressed VaR estimates (Basel Committee on Banking 
Supervision 2013a). Firms are required to estimate VaR using historical data from 
stress periods in the financial markets (such as 2008). Stressed VaR calculations have 
more to do with the choice of historical data than with the choice of conditional or 
unconditional distribution. In principle, a stressed VaR estimate can be computed 
using either approach. The key point is that historical time-series data from stress 
periods are substituted for the up-to-date time-series data from which regular VaR 
estimates are made. 
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9.2.2 Variance—Covariance Method 


This method was originally pioneered by JPMorgan’s RiskMetrics group in the 
early 1990s. Although used by a minority of banks today, it remains an important 
contribution to the development of the methodology for market-risk measurement. 
As mentioned in Section 2.2.3, it is an example of an analytical method in which 
the linearized loss distribution has a known form and estimates of VaR and expected 
shortfall can be computed with simple formulas. 

In the variance—covariance method we assume that the conditional distribution 
of risk-factor changes F’y,, ,|¥, is a multivariate normal distribution with mean vec- 
tor f;4; and covariance matrix X41. In other words, we assume that, given F;, 
X41 ~ Na(er41, X41), where p41 and X,+; are ¥;-measurable. 

The estimation of Fy,,,|%, can be carried out in a number of ways. We can fit 
a (multivariate) time-series model to historical data X;_n+41,..., X; and use the 
fitted model to derive estimates of u;+1 and X41. In Section 4.2.5 we explained the 
procedure for univariate GARCH or ARMA-—GARCH models (see Examples 4.25 
and 4.26 in particular). The same idea carries over to the multivariate GARCH 
models of Section 14.2 (see Section 14.2.6 in particular). 

Alternatively, and more straightforwardly, the model-free exponentially weighted 
moving-average (EWMA) procedure suggested by the RiskMetrics group can be 
used. The univariate version of this technique was presented in Section 4.2.5 and the 
multivariate version is a simple extension of the idea. Let us suppose that we work in 
the context of a multivariate model with conditional mean p; = E(X; | F;-1) = 0. 
The conditional covariance matrix X;+1 is estimated recursively by 


B41 = OX, X' + (1 — 0) 3;, (9.22) 


where @ is a small positive number (typically of the order 6 ~ 0.04). The estima- 
tor (9.22) takes the form of a weighted sum of the estimate of X, calculated at time 
t — 1 and aterm X; X; that satisfies E(X;X; | ¥;-1) = X,. The interpretation is 
that the estimate at time ¢ is obtained by perturbing the estimate at time t — 1 by a 
term that reacts to the “latest information” about the joint variability of risk-factor 
changes. 

For n large we can calculate that 


n—l1 
S41 © 0 sa SOY X, GX. 

i=0 
This means that, after the EWMA procedure has been running for a while, the 
influence of starting values for the conditional covariance matrix is negligible and 
estimates are effectively weighted sums of the matrices X, X; where the weights 
decay exponentially. These estimates are usually quite close to estimates derived by 
formal multivariate GARCH modelling. The method can be refined by relaxing the 
assumption that the conditional mean satisfies u; = 0 and including an estimate of 
Mr obtained by exponential smoothing to get the updating equation 


É = OX, — hi) (Xr — fs)! + 1 — 3%. 
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The second critical assumption in the variance—covariance method is that the lin- 
ear loss operator (9.7) for the portfolio in question is a sufficiently accurate approx- 
imation of the actual loss operator (9.5). The linear loss operator is a function of the 
form 

Linx) = —(cr + bx) (9.23) 


for some constant c; and constant vector b;, which are known to us at time t. We 
have seen a number of examples including 


e the stock portfolio of Example 2.1, where the loss operator takes the form 
ln (x) = —v;w)x and wy is the vector of portfolio weights at time ft; 


e the European call option of Example 2.2; and 


e the zero-coupon bond portfolio with linear loss operator given by (9.13). 


An important property of the multivariate normal is that a linear function (9.23) 
of a normal vector must have a univariate normal distribution, as discussed in Sec- 
tion 6.1.3. From (6.13) we infer that, conditional on F+, 


Lay = AX) ~ Ne — bihis, b, Zib). (9.24) 


VaR and expected shortfall may be easily calculated for the normal loss distribution 
in (9.24). For VaR we use formula (2.18) in Example 2.11. For expected shortfall 
we use formula (2.24) in Example 2.14. 


Weaknesses of the method and extensions. The variance—covariance method offers 
a simple analytical solution to the risk-measurement problem but this convenience 
is achieved at the cost of two crude simplifying assumptions. First, linearization 
may not always offer a good approximation of the relationship between the true loss 
distribution and the risk-factor changes, particularly for derivative portfolios and 
longer time intervals. Second, the assumption of normality is unlikely to be realistic 
for the conditional distribution of the risk-factor changes for shorter-interval data 
such as daily data and weekly data. Another way of putting this is to say that the 
innovation distribution in a suitable time-series model of such data is generally 
heavier tailed than normal (see Example 4.24). 

The convenience of the variance—covariance method relies on the fact that a linear 
combination of a multivariate Gaussian vector has a univariate Gaussian distribution. 
However, we have seen in Chapter 6 that there are other multivariate distribution 
families that are closed under linear operations, and variance—covariance methods 
can also be developed for these. Examples include multivariate t distributions and 
multivariate generalized hyperbolic distributions (see, in particular, Proposition 6.13 
and (6.45)). 

For example, suppose we model risk-factor changes in such a way that the con- 
ditional distribution is a multivariate ¢ distribution; in other words, assume that 
X141 | Fi ~ ta(v, p, X), where this notation was explained in Section 6.4 (see 
Example 6.7). Then, conditional on ¥;, we get from (9.23) that 


LAY = AX) ~ tv, er — bip, b; b), (9.25) 


and risk measures can be calculated using (2.19) and (2.25) in Example 2.14. 
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9.2.3 Historical Simulation 


Historical simulation is by far the most popular method used by banks for the 
trading book; Pérignon and Smith (2010) report that 73% of US and international 
commercial banks that disclose their methodology use this method. Moreover, many 
of these firms use historical simulation in a simple unconditional manner, as will 
be explained in this section. In Section 9.2.4 we discuss different approaches to 
adapting historical simulation to give conditional measures of risk that take account 
of changing market volatility. 

Instead of estimating the distribution of ly (X;+1) under some explicit paramet- 
ric model for X;+1, the historical-simulation method can be thought of as estimat- 
ing the distribution of the loss operator under the empirical distribution of data 
Xt—n+1,---, Xr. The method can be concisely described using the loss-operator 
notation; we construct a univariate data set by applying the operator to each of our 
historical observations of the risk-factor change vector to get a set of historically 
simulated losses: 


{Ly =Iy(Xs)i s=t—n+1,...,t}. (9.26) 


The values L, show what would happen to the current portfolio if the risk-factor 
changes on day s were to recur. We make inferences about the loss distribution and 
risk measures using these historically simulated loss data. 

If we assume for a moment that the risk-factor changes are iid with df Fy and 
write F,,(/) for the empirical df of the data PER EE ie then we may use the 
strong law of large numbers to show that, as n — oo, 


t t 
1 1 
Fi) = -> > lin, Y laagi 
s=t—n+1 s=t—n+1 


> Pla (X) <D = FLO, 


where X is a generic vector of risk-factor changes with distribution Fy and where 
L = lin (X). Thus F, (l) is a consistent estimator of the df of ly (X) under Fy. 

The same conclusion will also apply for many strictly stationary time-series mod- 
els, such as GARCH processes, under a suitable adaptation of the strong law of large 
numbers. Since the empirical df of the historically simulated loss data estimates the 
distribution of lp (X) under Fy, historical simulation in its basic form is an uncon- 
ditional method. 

In practice, there are various ways we can use the historically simulated loss data. 
It is common to estimate VaR using the method of empirical quantile estimation, 
whereby theoretical quantiles of the loss distribution are estimated by sample quan- 
tiles of the data. As an alternative, the EVT-based methods of Section 5.2 can also 
be used to derive parametric estimates of the tail of the loss distribution. Further 
discussion of these topics is deferred to Section 9.2.6. 


Strengths and weaknesses of the method. The historical-simulation method has 
obvious attractions: it is easy to implement and reduces the risk-measure estimation 
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problem to a one-dimensional problem; no statistical estimation of the multivariate 
distribution of X is necessary, and no assumptions about the dependence structure 
of risk-factor changes are made; we usually work with the original loss operator and 
not a linearized approximation. 

However, as we have observed, it is an unconditional method and is therefore 
unsuited to giving dynamic, conditional measures of risk that capture the volatile 
nature of risk in the trading book. Advocates of historical simulation often find 
a virtue in the fact that it gives more stable, less volatile estimates of risk than a 
conditional method. But it is more prudent to separate the statistical or econometric 
problem of accurately estimating risk measures from the regulatory problem of 
imposing more stable capital requirements; to some extent, stability is behind the 
sixty-day smoothing used in (2.20). For this reason we look at dynamic extensions 
of historical simulation in Section 9.2.4. 

Another issue is that the success of the approach is dependent on our ability 
to collect sufficient quantities of relevant, synchronized data for all risk factors. 
Whenever there are gaps in the risk-factor history, or whenever new risk factors are 
introduced into the modelling, there may be problems filling the gaps and completing 
the historical record. These problems will tend to reduce the effective value of n and 
mean that empirical estimates of VaR and expected shortfall have very poor accuracy. 
Ideally we want n to be fairly large, since the method is an unconditional method 
and we want a number of extreme scenarios in the historical record to provide more 
informative estimates of the tail of the loss distribution. 

The method has been described as being like “driving a car while looking through 
the rear view mirror’, a deficiency that is shared to an extent by all purely statistical 
procedures. It is for this reason that the stressed VaR calculations mentioned in 
Section 9.2.1 were introduced by regulators. 

Finally, although the historical-simulation method can be easily described, it may 
prove difficult to implement efficiently for large portfolios of derivative instruments. 
Computing the historically simulated losses in (9.26) involves what practitioners 
refer to as full revaluation of the portfolio under each of the historical scenarios Xs, 
and this may be computationally costly. To get round the problem of full revalua- 
tion in producing the simulated losses in (9.26), we can consider substituting the 
quadratic loss operator in for the loss operator lj] and working with second-order 
approximations to the losses. This means that only the risk factor sensitivities are 
required, and it is these that are often routinely calculated for hedging purposes 
anyway. 


9.2.4 Dynamic Historical Simulation 


In this section we present two approaches to incorporating volatility forecasting into 
historical simulation: a univariate approach based on univariate volatility prediction 
using the kind of models presented in Chapter 4, and a multivariate approach. 


A univariate approach to dynamic historical simulation. For a given loss operator 
ly] at time f, we recall the construction of the historical simulation data {Ls = 
l(Xs): s =t—n+l1,...,t}in (9.26). We assume that these are realizations from 
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a stationary univariate stochastic process (Ls) obtained by applying the function 
liq: R? — R toa stationary multivariate process of risk-factor changes (X,), and 
we also assume that L;41 = /)(X++1) is the next random variable in this process. 

Moreover, we assume that the stationary process (Ls) satisfies, for all s, equations 
of the form L; = Us + os Zs, where us is an F;_1-measurable conditional mean 
term, o; is an F;_;-measurable volatility, and the (Z;) are SWN (0, 1) innovations 
with df Fz. An example of a model satisfying these assumptions would be an ARMA 
process with GARCH errors as defined in Section 4.2.3. As shown in Section 4.2.5, 
we can derive simple formulas for the VaR and expected shortfall of the conditional 
loss distribution Fy, ,|¢, under these assumptions. 

Writing VaR{, for the a-quantile of Fz,,,)¢, and ES$, for the corresponding 
expected shortfall, we obtain 


VaR), = Mi+1 + 07419a(Z), ES) = Mi+1 + 0741 ESa(Z), (9.27) 


where Z is a generic rv with the df Fz. 

To estimate the risk measures in (9.27), we require estimates of u,+1 and 0/41 
and estimates of the quantile and expected shortfall of the innovation df Fz. In 
a model with Gaussian innovations the latter need not be estimated and are sim- 
ply qa (Z) = D7! (œ) and ESy(Z) = (87! (æ))/(1 — æ), where the latter formula 
was derived in Example 2.14. In a model with non-Gaussian innovations, qa (Z) 
and ES,(Z) depend on any further parameters of the innovation distribution. For 
example, we might assume (scaled) ¢ innovations; in this case, the quantile and 
expected shortfall of a standard univariate ¢ distribution (the latter given in (2.25)) 
would have to be scaled by the factor ./(v — 2)/v to take account of the fact that 
the innovation distribution is assumed to have variance 1. 

We now give a number of possible estimation strategies. In all cases the data are 
the historical simulation data ‘pees {stacy Eyi 


(1) Fit an ARMA-GARCH model with an appropriate innovation distribution to 
the data by the ML method and use the prediction methodology discussed 
in Section 4.2.5 to estimate o;4) and ur+1. Any further parameters of the 
innovation distribution can be estimated simultaneously in the model fitting. 


For example, suppose we use an AR(1)-GARCH(1,1) model, which, 
according to Definition 4.22, takes the form 


Ts = Us + Os Zs, 
Hs = u + b1(L 5-1 — Ms-1), 
o? = Q@0 + a1 (Ls—1 = aD T Bion 


at any time s. The conditional mean u,+ı and standard deviation o+; are 
then estimated recursively by 


fit = a+ bi(L — ĝi), 


O41 = Vå +â (L — fr)? + ĝ1ô?, 
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where ML estimates of the parameters of the AR(1)-GARCH(1,1) model are 
denoted using hats. 


(2) Fit an ARMA-GARCH model by QML (see Section 4.2.4) and use prediction 
methodology as in strategy (1) to estimate o;+1 and j4;4 1. In a separate second 
step use the model residuals to find estimates of qa (Z) and ES, (Z). As for 
the basic historical-simulation method, this can be achieved using simple 
empirical estimates of quantiles and expected shortfalls or semi-parametric 
estimators based on EVT (see Section 9.2.6). 


(3) Use the univariate EWMA procedure (see Section 4.2.5) to estimate 
Ot—n+l1,-++, Ot, Ot+1. 


The conditional mean terms Ut—-n+1, ..-, Mt, Ur+1 Could also be estimated 
by exponential smoothing but it is easier to set them equal to zero, as 
they are likely to be very small. Standardize each of the historical simula- 
tion losses EP Teaia Li by dividing by the EWMA volatility estimates 
Ot—n+1, +++, 64. This yields a set of residuals, from which the innovation dis- 
tribution Fz can be estimated as in strategy (2). 


These procedures often work well in practice but there can be some loss of infor- 
mation involved with applying volatility modelling at the level of the historically 
simulated data rather than at the level of the risk-factor changes themselves. We now 
present a second method that incorporates volatility at the level of the individual 
risk factors. While the method is more computationally intensive, it can result in 
more accurate estimates of risk measures. 


A multivariate approach to dynamic historical simulation. In this method we work 
with risk-factor change data X;_n+1,..., X, and assume that the data vectors are 
realizations from a multivariate time-series process (X,) that satisfies equations of 
the form 


Xs = hs + ^sZs, As= diag(os,1, e.’ osd), 


where (s) is a process of vectors and (A;) a process of diagonal matrices such 
that us,1, -.-, Ms,d, Os,1, +--+, Os,d are all F;_1-measurable and (Zs) ~ SWN(0, P) 
for some correlation matrix P (in other words, the Z, are iid random vectors whose 
covariance matrix is the correlation matrix P). Under these assumptions, E(Xs,k | 
Fs—1) = us, k and var(Xs ¢ | Fs—1) = o? p> SO the vector ys contains the conditional 
means and the matrix A, contains the volatilities of the component series at time s. 
An example of a model that fits into this framework is the CCC-GARCH (constant 
conditional correlation) process (see Definition 14.11). 

In this context we may use multivariate dynamic historical simulation. The 
key idea of the method is to apply historical simulation to the unobserved inno- 
vations (Z,) rather than the observed data (X;) (as in standard historical sim- 
ulation). The first step is to compute estimates {fis: s =t—n+1,...,t} and 
{A, :s =t —n + 1,...,t} of the conditional mean vectors and volatility matrices. 
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This can be achieved by fitting univariate time-series models of ARMA-GARCH 
type to each of the component series in turn; alternatively, we can use the univariate 
EWMA approach for each series. In either case we also use prediction methodology 
to obtain estimates of iit, the volatility matrix in the next time period, and (if 
desired) jft;41, the conditional mean vector. 

In the second step we construct residuals 


{Z, = AT! (X, — fis): 8 =t-n+1,...,t} 


and treat these as “observations” of the unobserved innovations. To make statistical 
inferences about the distribution of Li+ı = [pq(X141) = li (Mii + Agi Zea) 
given ¥; we construct the data set 


{Ls = ly (rst + Ari Zs): s =t—n+1,..., th. (9.28) 


To estimate VaR (or expected shortfall) we can apply simple empirical estimators 
or EVT-based methods directly to these data (see Section 9.2.6 for more details). 


9.2.5 Monte Carlo 


The Monte Carlo method is a rather general name for any approach to risk mea- 
surement that involves the simulation of an explicit parametric model for risk-factor 
changes. Many banks report that they use a Monte Carlo method to compute mea- 
sures of market risk in the trading book (Pérignon and Smith 2010). However, the 
Monte Carlo method only offers a solution to the problem of evaluating the distri- 
bution of L;41 = /j(X1+41) under a given model for X;+1. It does not solve the 
statistical problem of finding a suitable model for X;+1. 

In the market-risk context let us assume that we have estimated a time-series 
model for historical risk-factor change data X;~n+1,..., X, and that this is a model 
from which we can readily simulate. We use the model to generate m independent 
realizations X ee hag A o from the estimated conditional distribution of risk- 
factor changes Fx, , \|F,- 

In asimilar fashion to the historical-simulation method, we apply the loss operator 
to these simulated vectors to obtain simulated realizations La = ly (Š ee 
i =1,...,m)} from the estimated conditional loss distribution F,, | ¢,. As for the 
historical-simulation method, the simulated loss data from the Monte Carlo method 
are used to estimate VaR and expected shortfall, e.g. by using simple empirical 
estimators or EVT-based methods (see Section 9.2.6 for more details). 

Note that the use of Monte Carlo means that we are free to choose the number 
of replications m ourselves, within the obvious constraints of computation time. 
Generally, m can be chosen to be much larger than n (the number of data) so we 
obtain more accuracy in empirical VaR and expected shortfall estimates than is 


possible in the case of historical simulation. 


Weaknesses of the method. As we have already remarked, the method does not 
solve the problem of finding a multivariate model for F'y,, || ¢, and any results that 
are obtained will only be as good as the model that is used. 
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For large portfolios the computational cost of the Monte Carlo approach can be 
considerable, as every simulation ideally requires the full revaluation of the port- 
folio to compute the loss operator. This is particularly problematic if the portfolio 
contains many derivatives that cannot be priced in closed form. The problem of 
computational cost is even more relevant to the Monte Carlo method than it is to 
the historical-simulation method, because we typically choose larger numbers of 
scenarios m for risk-factor changes in the Monte Carlo method. If second-order 
sensitivities are available, the loss operator can be replaced by the quadratic loss 
operator / a 
niques for evaluating tail probabilities and quantiles, such as importance sampling, 
can also be of help (see Notes and Comments). 


to reduce the computational cost. Moreover, variance-reduction tech- 


9.2.6 Estimating Risk Measures 


In both the historical simulation and Monte Carlo methods we estimate risk measures 
using simulated loss data. In this section we discuss different methods for estimating 
VaR and expected shortfall from a data sample. Let us suppose that we have data 
L,,..., Ln from an underlying loss distribution Fz and the aim is to estimate 
VaR = qa (F1) = Fx (@) or ES« = (1 — @) 7!) qo (Fr) d0. 


L-estimators. These estimators take the form of linear combinations of sample 
order statistics, and the “L” in their name refers to linear. In Chapter 5 we defined 
the upper-order statistics Li,n > ++- > Ly», as is standard in extreme value theory. 
Many of the results concerning L-estimators are given in terms of lower-order statis- 
tics La) < ++- < Ln). Note that we can easily move between the two conventions 
by observing that Ly, = Liy—x41) fork = 1,...,n. 

The simplest L-estimator of VaR is the sample quantile obtained by inverting the 


empirical distribution function Fy, (x) = n7! yan l(1,;<x} Of the data L1,..., Ln. 
It may be easily verified that the inverse of the empirical df is given by 
Pa k-1 k 
F, (a) = Lik) for <a<-. 
n 


We may write this more compactly as F(a) = Ljna}), where [x] = min{k € 
Z: k > x}. This is the ceiling function that gives the smallest integer not less than x. 
In working with order statistics we often use both the ceiling function and the floor 
function, |x] = max{k € Z: k < x}, the largest integer not greater than x. It is easy 
to see that they are related by |—x] = —[x]. This fact, together with the relation 
Lk,n = Ln-k+1), allows us to write the sample quantile in terms of upper-order 
statistics. We have that L(jna}) = Lin, where k = n — [na] +1 = [n(l—a@)} +1, 

giving 
VaRy = Lin, k= |n—a)J| +1. (9.29) 


For example, if n = 1000 and œ = 0.995, the estimator would be Le, 1000, the sixth 

largest value. For the same data and a = 0.9945 the estimator is also Lé, 1000. 
Inverting the empirical distribution function yields a sample quantile function that 

is discontinuous in œ. To obtain a continuous function in «œ there are a number of 


348 9. Market Risk 


alternative definitions of sample quantiles that interpolate linearly between adjacent 
order statistics. For example, the default method in the statistical package R estimates 
the a-quantile to be 


VaRa = AwknLle+in + (1 — dak n)Lkn k=[M—-1DU-—a)], (9.30) 


where the weights are given by Ag kn = (n — k) — (n — 1)«æ. If n = 1000 and 
a = 0.995, then k = 5 and the estimator is 0.995 L6, 1000 + 0.005L5 1000. If we want 
to estimate the 0.9945 quantile from the same data, then k = 6 and the estimator 
becomes 0.4945 L7,1000 + 0.5055 L6,1000- 

The estimators (9.29) and (9.30), being based on only one or two order statistics, 
are subject to a large variance, particularly for quantiles in the tail of the distribution 
and for small sample sizes. 

To obtain an L-estimator of expected shortfall we recall from Section 8.2.1 the 
general form of the distortion risk measures, of which expected shortfall is a special 
case. Distortion risk measures are given by 


1 
o(L) = f FÉ (u)dD(u) 


for convex distortion functions D on [0, 1]; the distortion function for expected 
shortfall is Da (u) = (1 — œ)! (u — æ)". L-estimators for distortion risk measures 
may be derived by inserting the inverse of the empirical df as an estimator of F$ 
to obtain 


> es “ k k-1 
o(L) = i F, (u) dD(u) = X Lw (>.(=) = p,(—*)). 
k=1 


In the special case of expected shortfall the estimator is 


pe 1 = + + 
Be =e LO na)t — ((k — 1) —na)*) 


1 


1 la(d—a)] 
= azl X Len) + ([na] -noya -orn ) 


k=1 


The final term involving Lin(i—a)|+1,. May sometimes be omitted for a simpler 
estimator. 


EVT-based estimators. Simple empirical estimates of the VaR and, especially, the 
expected shortfall are likely to be inaccurate when n is of modest size (say only a 
few years of daily data). This is a problem for historical simulation in particular. A 
possible solution is to use the techniques of extreme value theory (EVT) to provide 
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estimates of the tail of the loss distribution that are as faithful as possible to the 
most extreme data and that use parametric forms that are supported by theory. In 
Section 5.2.3 we presented a standard EVT method based on the generalized Pareto 
distribution that is useful in this context. 

To use this method to estimate VaRyg and ES, we can set a high threshold u = 
Lk+1,n at the (k + 1)-upper-order statistic and fit a GPD distribution to excess losses 
over u. We thereby obtain ML estimates ê and B based on k exceedances of the 
threshold. To form a quantile estimator, the value k must satisfy k/n > 1 — a; 
moreover, k should be sufficiently large to give reasonably accurate estimates of the 
GPD parameters. 

We then form the risk-measure estimates 


mone!) ot 
aT 


— R ĝ— Ê 
pon Nene N 
T=% l—& 


For more guidance on the choice of threshold, see Section 5.2.2; for a compari- 
son of the EVT quantile estimates with simple empirical quantile estimates, see 
Section 5.2.5. 


9.2.7 Losses over Several Periods and Scaling 


In the banking context the methods we have described in previous sections are 
generally applied to daily risk-factor change data, and risk measures are routinely 
calculated for a one-day horizon. However, for regulatory capital purposes there is 
a requirement to calculate a 99% VaR estimate for a period of ten trading days (two 
weeks). 

An obvious approach to this calculation is to model historical risk-factor changes 
over ten-day intervals using exactly the same methodology that has been discussed 
in this chapter. However, for a fixed amount n of historical time-series data, this 
results in a dramatic reduction in the precision of the statistical estimates of model 
parameters. For example, if we have n = 1000 days (just under four years) of 
historical data, this would give only 100 non-overlapping observations of ten-day 
risk-factor changes. To obtain similar accuracy to an analysis of the daily returns we 
would have to collect n = 10000 daily data (around thirty-eight years). It is possible 
to artificially preserve the value of n by the formation of overlapping risk-factor 
returns (a construction that is described in Section 3.1). However, this introduces 
new serial dependencies into the data, which complicates statistical modelling and 
does not lead to an obvious gain in statistical accuracy. 

For these reasons most banks use a simple scaling rule, known as the square-root- 
of-time rule, to move between estimates of one-day VaR and estimates of ten-day 
VaR. We now look at the (limited) theoretical support for this rule and discuss an 
alternative Monte Carlo approach. 
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Scaling. Forh € a one h > 1 suppose we denote the loss from time t over the 


next h periods by L” RE Arguing as in (9.3) and (9.4) we have 


LO, = (Van — Vo) 
= —(8 (Trh, Zr+n) — 8 (t1, Zi)) 
= —(8(Tr+h, Zt + Xii + +++ + Xin) — 8 (tr, Zr)) 


h 
1h) 
= liq (> Xm) 
i=1 


where ne represents a loss operator at time ¢ for the h-period loss. The general 
oe of interest is how risk measures applied to the conditional distribution of 
L iyn given F; scale with h, and this has no simple answer except in special cases. 
Note that the -period loss operator differs from the one-period loss operator in 
situations where the mapping depends explicitly on time (such as derivative port- 
folios). For simplicity let us consider the case in which the mapping does not depend 
on calendar time, so that / la (x) = l (x). The linearized form of this operator is 
of the form lit a 1%) = = bix for some vector b; that is known at time t. We look at the 
simpler meble of scaling for risk measures applied to the linearized loss distribu- 


tion: 
h 
h 
US = AÉ xn) = Sou Xe (9.31) 
i=l 
The following example gives a justification for the square-root-of-time rule. 


Example 9.4 (square-root-of-time scaling). Suppose the risk-factor change vec- 
tors are iid with distribution N4 (0, X). Then ye i=1 Xr+i ~ Na(O, h X) and the dis- 
tribution of LOA i in (9.31) satisfies ig ~ N(0, hb) Yb;). It then follows easily 
from (2.18) and (2.24) that both quantiles and expected shortfalls for this distribu- 
tion scale according to the square root of time (v/h). For example, writing ES” for 


the expected shortfall, we have 


Vio $e) 
where o? = bi Xb,. Clearly, ES”? =h ES{P and, with similar notation, VaR) = 
Jh VaR. 


Although this scaling rule is quite commonly used in practice, empirical risk- 
factor change data generally support neither a Gaussian distributional assumption 
nor an iid assumption (see Section 3.1). Moreover, for the kinds of dynamic time- 
series model (such as GARCH) that are appropriate, very little is known about the 
scaling of risk measures for the conditional loss distribution of the h-period loss 


E, (or its linearized form). 


ES” = 


’ 


Monte Carlo approach. Itis possible to use a Monte Carlo approach to the problem 
of determining risk measures for the h-period conditional loss distribution. Suppose 
we have a time-series model for the risk-factor changes (Xs)s<r. We simulate future 
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paths of the process X9 kasa x0, fori = 1,...,m, where m is a predetermined 
large number of replications. We then apply the h-period loss operator to these 
simulated data to obtain Monte Carlo simulated losses: 
= (h)(i h), Si = (i . 
ZOO STAD ee XO iS Ragin): 
These are used to make statistical inferences about the loss distribution and asso- 
ciated risk measures, as described in Section 9.2.5. We can also use the Monte 
Carlo approach to examine the performance of square-root-of-time scaling and to 


experiment with alternative power laws (see Notes and Comments). 


Notes and Comments 


Standard methods for market risk are described in detail in Jorion (2007) and Crouhy, 
Galai and Mark (2001). For the variance—-covariance approach using EWMA, see 
Mina and Xiao (2001). A useful overview of the popularity of different approaches 
in practice is given by Pérignon and Smith (2010). The multivariate approach to 
dynamic historical simulation is described by Hull and White (1998) and Barone- 
Adesi, Bourgoin and Giannopoulos (1998). 

The book by Glasserman (2003) is an excellent general introduction to Monte 
Carlo simulation techniques in finance. Glasserman, Heidelberger and Shahabud- 
din (1999) present efficient numerical techniques (based on delta-gamma approx- 
imations and advanced simulation techniques) for estimating VaR for derivative 
portfolios in the presence of heavy-tailed risk factors. 

For a reference on different definitions of empirical quantile estimates and their 
properties, see Hyndman and Fan (1996). Tsukahara (2009) describes L-estimators 
of distortion risk measures, which apply to the case of expected shortfall. The use of 
EVT to provide dynamic estimates of risk measures was introduced by McNeil and 
Frey (2000), who also highlight the differences between conditional and uncondi- 
tional approaches. For risk-measure estimation applying EVT to a regime-switching 
model, see Chavez-Demoulin, Embrechts and Sardy (2014). 

A useful summary of scaling results for market-risk measures may be found in 
Kaufmann (2004) (see also Brummelhuis and Kaufmann 2007; Embrechts, Kauf- 
mann and Patie 2005). In these papers the message emerges that, for unconditional 
VaR scaling over longer time horizons, the square-root-of-time rule often works well. 
On the other hand, for conditional VaR scaling over short time horizons, McNeil 
and Frey (2000) use the Monte Carlo approach to present evidence against square- 
root-of-time scaling. For further comments on these and further scaling issues, see 
Diebold et al. (1998) and Danielsson and de Vries (1997c). 

For a practically oriented text on market-risk management see Daníelsson (2011). 


9.3 Backtesting 


Backtesting is the practice of evaluating risk measurement procedures by comparing 
out-of-sample estimates of risk measures with actual realized losses and gains. Back- 
testing allows us to address the question of whether a given estimation procedure 
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produces credible risk-measure estimates. In Section 9.2 we considered standard 
methods for estimating risk measures at a time t for the distribution of losses in the 
next period. At the end of the next period we have the opportunity to compare the 
risk-measure estimate with the actual realized loss. When this procedure is repeated 
over many time periods we can monitor the performance of methods and compare 
their relative performance. 

In Section 9.3.1 we discuss the backtesting of VaR estimates, and in Section 9.3.2 
we discuss the backtesting of expected shortfall. Section 9.3.3 examines the use of 
elicitability theory to construct natural scoring statistics for comparing the backtest 
results for different VaR estimation methods. An empirical example of backtesting 
is described in Section 9.3.4, and in Section 9.3.5 we briefly consider backtests of 
the whole estimated loss distribution. 


9.3.1 Violation-Based Tests for VaR 


At any time point ¢ let VaR‘, denote the a-quantile of the conditional loss distri- 
bution FL, |f,- We will refer to the event {L;41 > VaRi} as a VaR violation or 
exception and define the event indicator variable by /;41 = /;z,,;>vart}- Assuming 
a continuous loss distribution, we have, by definition of the quantile, that 


Elgi | Fi) = P(Lr41 > VaRy | F) = 1-4, (9.32) 


so that Iı is a Bernoulli variable with event probability 1 — a. Moreover, the 
following lemma shows that the sequence of VaR violation indicators (J,;) forms a 
Bernoulli trials process, i.e. a process of iid Bernoulli random variables with event 
probability 1 — a. 


Lemma 9.5. Let (Y;);¢z be a sequence of Bernoulli indicator variables adapted to 
a filtration (F;)rez and satisfying E(Y;41 | F+) = p > 0 for all t. Then (Y;) is a 
process of tid Bernoulli variables. 


Proof. The process (Y; — p)rez has the martingale-difference property (see Defini- 
tion 4.6). Moreover, var(Y; — p) = E(E((% — p% | Fr-1)) = pC — p) for all t. 
As (Y, — p) is a martingale-difference sequence with a finite variance, it is a white 
noise process (see Section 4.1.1). Hence (Y,) is a white noise processes of uncor- 
related variables. If two Bernoulli variables Y, and Y, are uncorrelated, it follows 
that 


0 = cov (Y¥;, Ys) = E(V:¥s) — EYDE Ys) 
= P(Y,=1,Y; =) - PY = DPO, = 1), 


which shows that they are also independent. 


There are two important consequences of the independent Bernoulli behaviour 
for violations. First, if we sum the violation indicators over a number of dif- 
ferent times, we obtain binomially distributed random variables. For example, 
M = yo, Li41 ~ B(m, 1 — œ). Second, the spacings between consecutive vio- 
lations are independent and geometrically distributed. Suppose that the event 
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{Li+ > VaR‘ } occurs for times t € {Ti,..., Ty}, and let To = 0. Then the 
spacings S; = T; — T;_, will be independent geometrically distributed random 
variables with mean 1/(1 — a), so that P(S; = k) = a&~'(1 — a) fork € N. Both 
of these properties can be tested on empirical data. 

Suppose that we now estimate VaR‘, based on information available up to time t, 
and we denote our estimate by VaR‘. The empirical violation indicator variable 


fred = 11.15 VaRi) 
represents a one-step, out-of-sample comparison, in which we compare the actual 
realized value L;, with our VaR estimate made at time t. 

Under the null hypothesis that our estimation method is accurate, in the sense that 
E(hat | Fi) = 1 — a at the time points t = 1, ..., m, the sequence of empirical 
indicator variables Git 1)ı<r<m Will then form a realization from a Bernoulli trials 
process with event probability 1 — a. For example, the quantity 77", Ta should 
behave like a realization from a B(m, 1 — œ) distribution, and this hypothesis can be 
easily addressed with a binomial test. There are a number of varieties of binomial 
test; in a two-sided score test we compute the statistic 


O Eiir mA- a) 
7 JYma(l — a) 

and reject the hypothesis of Bernoulli behaviour at the 5% level if |Zm| > 
&—!(0.975). Rejection would suggest either systematic underestimation or overesti- 
mation of VaR (see Notes and Comments for further references concerning binomial 
tests). 

To check the independence of violations we can construct a test of the geometric 
hypothesis. Since violations should be rare events with probability (1 — œ) < 0.05, 
it proves easier to use the fact that a discrete-time Bernoulli process for rare events 
can be approximated by a continuous-time Poisson process and that the discrete 
geometric distribution for the event spacings can be approximated by a continuous 
exponential distribution. 

To be precise let us suppose that the time interval [t,t + 1] in discrete time 
has length At in the chosen unit of continuous time. For example, if [t,t + 1] 
represents a trading day, then Art = 1 if time is measured in days and At = 1/250 
if time is measured in years. If the Bernoulli rare event probability is 1 — œ, then 
the approximating Poisson process has rate à = (1 — œ) At and the approximating 
exponential distribution has parameter à and mean 1/2. 

The exponential hypothesis can be tested using a Q-Q plot of the spacings data 
against the quantiles of a standard exponential reference distribution, similar to 
the situation discussed in Section 5.3.2. Alternatively, Christoffersen and Pelletier 
(2004) have proposed a likelihood ratio test of the hypothesis of exponential spacings 
against a more general Weibull alternative. In the exponential model the so-called 
hazard function of an event is constant, but the Weibull distribution can model an 
event clustering phenomenon whereby the hazard function is initially high after an 
event takes place and then decreases (see Section 10.4.1 for more discussion of 
hazard rate models). 


Zm 


(9.33) 
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In Section 9.2.7 we also discussed VaR estimates for the h-period loss distribution. 
To use the tests described above on h-period estimates we would have to base our 
backtests on non-overlapping periods. For example, if we calculated two-week VaRs, 
we could make a comparison of the VaR estimate and the realized loss every two 
weeks, which would clearly lead to a relatively small amount of violation data 
with which to monitor the performance of the model. It is also possible to look at 
overlapping periods, e.g. by recording the violation indicator value every day for 
the loss incurred over the previous two weeks. However, this would create a series 
of dependent Bernoulli trials for which formal inference is difficult. 


9.3.2 Violation-Based Tests for Expected Shortfall 


It is also possible to use information about the magnitudes of VaR violations to 
backtest estimates of expected shortfall. Let ES!, denote the expected shortfall of 
the conditional loss distribution F',, || ¢,, and define a violation residual by 


t 
Ki = (BE) ive (9.34) 
where 741 = E(Li+1 | Fi). In the event that there is a VaR violation {L;+1 > 
VaR% }, the violation residual K;+ı compares the actual size of the violation L;+1 
with its expected size conditional on information up to time t, given by ES}; if there 
is no VaR violation, the violation residual is 0. The reason for scaling the residual 
by (ES), — 1:41) will become apparent below. 

It follows from Lemma 2.13 that, for a continuous loss distribution, the identity 


E(Ki41 | F) = 0 


is satisfied so that the series of violation residuals (K,) forms a martingale-difference 
series. Under stronger assumptions we can use this as the basis for a backtest of 
expected shortfall estimates. Let us assume that the underlying process generating 
the losses (L+) satisfies, for all t, equations of the form L; = ut + ot Zr, where us 
is an ¥;_;-measurable conditional mean term, o; is an F;_;-measurable volatility 
and the (Z;) are SWN(0, 1) innovations. This assumption would be satisfied, for 
example, by an ARMA process with GARCH errors, which mimics many of the 
essential features of financial return data. Under this assumption we have that ES), = 
Mt+1 +0141 ESo (Z), where ES, (Z) denotes the expected shortfall of the innovation 
distribution. We can then calculate that 


g (Zt = E8alZ)) 
tl = ES, (Z) {Z141>4a(Z)}> 


so that the sequence of violation residuals (K;) forms a process of iid variables with 
mean 0 and an atom of probability mass of size @ at zero. 
Empirical violation residuals are formed, in the obvious way, by calculating 


^ Lisi - Es’ a 
Ri = (Fee in, (9.35) 
a  Mt+l 
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where ES) denotes the estimated value of ES$, at time t, byi is the VaR violation 
indicator defined in Section 9.3.1, and A,+1 is an estimate of the conditional mean. In 
practice, the conditional mean jz;+1 is not always estimated (particularly in EWMA- 
based methods), and when it is estimated it is often close to zero; for this reason we 
might simplify the calculations by setting A:+1 = 0. 

We expect the empirical violation residuals to behave like realizations of iid 
variables from a distribution with mean 0. We can test the hypothesis that the non- 
zero violation residuals have a mean of zero. The simplest approach is to use a 
t-test, and this is the option we choose in Section 9.3.4. It is also possible to use 
a bootstrap test that makes no assumption about the underlying distribution of the 
violation residuals (see Notes and Comments). 


9.3.3 Elicitability and Comparison of Risk Measure Estimates 


We now present a more recent approach to backtesting that is useful for comparing 
sets of risk-measure estimates derived using different methodologies. This approach 
is founded on the observation that the problem of estimating financial risk measures 
for the next time period is a special case of the general statistical problem of esti- 
mating statistics of a predictive or forecasting distribution. It is therefore natural to 
use ideas from the forecasting literature in backtesting. 

In forecasting it is common to make predictions based on the idea of minimizing 
a scoring function or prediction error function. For example, if we wish to minimize 
the squared prediction error, it is well known that we should use the mean of the 
predictive distribution as our forecast; if we wish to minimize the absolute prediction 
error, we use the median of the predictive distribution. 

The mean and median of the predictive distribution are known as elicitable stat- 
istical functionals of the distribution because they provide optimal forecasts under 
particular choices of scoring function. When a statistic is elicitable, there are natural 
ways of comparing different sets of estimates of that statistic using empirical scores. 

For example, let (X;) denote a time series and suppose we use two procedures 
A and B to estimate the conditional mean j4;4,; = E(X;41 | F;) at different time 
points, based on data up to time f, resulting in estimates AD, j € {A,B}, t = 
1,...,m. The conditional mean j1;+1 is known to be the optimal prediction of X;+1 
under a squared error scoring function. We therefore compare our estimates with 
the actual realized values X;,; by computing squared differences. The superior 
estimation procedure j will tend to be the one that gives the lowest value of the total 
squared prediction error X`- (X41 — fig 

We now give a more formal treatment of basic concepts from elicitability theory 
and show in particular how the ideas relate to the problem of estimating value-at-risk. 


Elicitability theory. A law-invariant risk measure ọ defined on a space of random 
variables M can also be viewed as a statistical functional T defined on a space of 
distribution functions X. If L € M has df Fy, € X, then ọ and T are linked by 
o(L) = T(F_); for example, VaRg(L) = Pe (a), so the functional in this case is 
the generalized inverse at a. 
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The theory of elicitability is usually presented as a theory of statistical function- 
als of distribution functions. Our account is based on the presentation in Bellini 
and Bignozzi (2013). Note that we follow Bellini and Bignozzi in restricting our 
attention to real-valued functionals; this is more natural for the application to finan- 
cial risk measures but differs from the more common presentation in the statistical 
forecasting literature where set-valued functionals are allowed (see, for example, 
Gneiting 2011). 

Elicitable statistical functionals are functionals that minimize expected scores 
where the expected scores are calculated using scoring functions. These quantify 
the discrepancy between a forecast and a realized value from the distribution. The 
formal definitions of scoring functions and elicitable functionals are as follows. 


Definition 9.6. A scoring function is a function S: R x R — [0, oo) satisfying, for 
any y,l € R: 


G) S(y, 1) > 0 and S(y, L) = 0 if and only if y = l; 
(ii) S(y, L) is increasing for y > / and decreasing for y < l; 
Gii) S(y, L) is continuous in y. 


Definition 9.7. A real-valued statistical functional T defined on a space of distribu- 
tion functions X is said to be elicitable on Xr C X if there exists a scoring function 
S such that, for every F € Xr, 


(1) Jre SO, DAFI) < œ, Vy €R, 
(2) T(F) = arg minyer fg SO, D dF (D). 
In this case the scoring function S is said to be strictly consistent for T. 


In the context of risk measures, if L is an rv with loss distribution function Fz, 
then an elicitable risk measure minimizes 


ESO, D) = f S50,DaF,O (9.36) 


with respect to y for every Fg € Xr, where Xr is the set of dfs for which the 
integral in (9.36) is defined. 

For example, let X be the set of dfs of integrable random variables. The mean 
E(L) = fr ldF (J) is elicitable on the space Xr of distribution functions with 
finite variance. This is clear because it minimizes (9.36) for the strictly consistent 
scoring function S(y, /) = (y — 1)”, as may be easily verified. 


Application to the VaR and expectile risk measures. We now consider the VaR and 
expectile risk measures. The former is elicitable for strictly increasing distribution 
functions and the latter is elicitable in general, subject of course to the moment 
conditions imposed by Definition 9.7. We summarize this information and give 
strictly consistent scoring functions in the following two propositions. 
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Proposition 9.8. For any 0 < œ < 1 the statistical functional T (Fr) = Fy (æ) 
is elicitable on the set of strictly increasing distribution functions with finite mean. 
The scoring function 


SLO, D = [lysy — all — yl (9.37) 


is strictly consistent for T . 


Proof. For L with df Fz, the expected score E (sd (y, L)) is a continuous function 
that is differentiable at all the points of continuity of Fz. The derivative is 


n E(S4 (y, L)) > f |1 || | d Fr (x) 
— é =— ; — 4a — x 
dy a Y dy J oo {y2x} 5 L 


d y d oy 
2 f Cee: drat = f a — j)dF.@) 
—oo y Jy 


dy 
=(1 -v f arro -a f dF (x) 
LS 7 


= FL(y)-— a. 


There are two cases to consider. If there exists a point y such that Fz (y) = a, then 
y = FF (a) and y clearly minimizes E(Sé (y, L)). If, on the other hand, the set 
{y: FL(y) = a} is empty, then it must be the case that there is a point y at which the 
distribution function Fz jumps and Fz (x) — a < Oforx < y and Fz (x) -—a > 0 
for x > y. It follows again that y = F$ (a) and that y is the unique minimizer of 
E(Sh(y, L)). 


Proposition 9.9. For any 0 < «œ < 1 the statistical functional T corresponding to 
the expectile risk measure ex is elicitable for all loss distributions Fr with finite 
variance. The scoring function 


SEO, D = [lysy — al = y)? (9.38) 


is strictly consistent for T . 


Proof. This follows easily from (8.31) and Definition 8.21 in Section 8.2.2. 


Characterizations of elicitable risk measures. The expectile is thus both an elic- 
itable and a coherent risk measure (provided @ > 0.5); it is in fact the only risk 
measure to have both these properties, as shown by Ziegel (2015). Bellini and Big- 
nozzi (2013) have provided an elegant result that characterizes all the elicitable 
risk measures that have the extra properties required for convexity, coherence and 
comonotone additivity. 


Theorem 9.10 (Bellini and Bignozzi (2013)). Let T (Fz) be the statistical func- 
tional corresponding to a law-invariant risk measure that is both monotonic and 
translation invariant. Then 


(a) T(F_) is convex and elicitable if and only if it is a risk measure based on a 
(convex) loss function (see Example 8.8), 
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(b) T(F_) is coherent and elicitable if and only ifit is an expectilee, witha > 0.5, 
and 


(c) T(F_) is coherent, comonotone additive and elicitable if and only if it coin- 
cides with the expected loss. 


In particular, we note that the expected shortfall risk measure cannot be elicitable 
according to this theorem. There are other ways of demonstrating this more directly 
(see, for example, Gneiting 2011). 


Computing empirical scores. As indicated in the introduction to this section, if 
we want to compare the performance of different methods of estimating elicitable 
functionals, we can use the strictly consistent scoring functions suggested by elic- 
itability theory. From Proposition 9.8 we know that VaR‘, the quantile of F1,,,:|¥;, 
minimizes E (S4 (y, L741) | F). 
Suppose, as in Section 9.3.1, that we compute estimates VaR’, of VaRi, on days 
t = 1,...,m, based on information up to time t, and backtest each estimate on day 
t + 1. A natural score is given by 
m 
NO S4(WaRy, Liti). 
t=1 
If we compute this quantity for different estimation methods, then the methods that 
give the most accurate estimates of the conditional quantiles VaR{, will tend to give 
the smallest scores. 
Unfortunately, since expected shortfall is not an elicitable functional, there is no 
natural empirical score for comparing sets of estimates of expected shortfall. 


9.3.4 Empirical Comparison of Methods Using Backtesting Concepts 


In this section we apply various VaR estimation methods to the portfolio of a hypo- 
thetical investor in international equity indices and backtest the resulting VaR esti- 
mates. The methods we compare belong to the general categories of variance— 
covariance and historical-simulation methods and are a mix of unconditional and 
conditional approaches. 

The investor is assumed to have domestic currency sterling (GBP) and to invest 
in the FTSE100 Index, the S&P 500 Index and the SMI (Swiss Market Index). The 
investor thus has currency exposure to US dollars (USD) and Swiss francs (CHF), 
and the value of the portfolio is influenced by five risk factors (three log index values 
and two log exchange rates). The corresponding risk-factor time series for the period 
2000-2012 are shown in Figure 9.4. 

On any day ¢ we standardize the total portfolio value V; in sterling to be 1 and 
assume that the portfolio weights (the proportions of this total value invested in each 
of the FTSE 100, the S&P 500 and the SMI) are 30%, 40% and 30%, respectively. 
Using similar reasoning to that in Example 2.1, it may be verified that the loss 
operator is 


tnx) = 1 — (0.3e*! + 0.4e2**4 + 0.3e34"5), 
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Figure 9.4. Time series of risk-factor values. These are (a) the FTSE 100, (b) the S&P 500, 
(c) the SMI, (d) the GBP/USD exchange rate and (e) the GBP/CHF exchange rate for the 
period 2000-2012. The final picture shows the corresponding historical simulation data (9.26) 
for the portfolio of Section 9.3.4. The vertical dashed line marks the date on which Lehman 
Brothers filed for bankruptcy. 


and its linearized version is 
Ix) = —(0.3x1 +0.42 + x4) + 0.3003 + x5)), 


where x1, x2 and x3 represent log-returns on the three indices, and x4 and x5 are 
log-returns on the GBP/USD and GBP/CHF exchange rates. 

Our objective is to calculate VaR estimates at the 95% and 99% levels for all trad- 
ing days in the period 2005—12. Where local public holidays take place in individual 
markets (e.g. the Fourth of July in the US), we record artificial zero returns for the 
market in question, thus preserving around 258 days of risk-factor return data in 
each year. We use the last 1000 days of historical data X;~999, ..., X; to make all 
VaR estimates for day t + 1 using the following methods. 


VC. The variance—covariance method assuming multivariate Gaussian risk-factor 
changes and using the multivariate EWMA method to estimate the conditional 
covariance matrix of risk-factor changes as described in Section 9.2.2. 


HS. The standard unconditional historical-simulation method as described in Sec- 
tion 9.2.3. 
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HS-GARCH. The univariate dynamic approach to historical simulation in which a 
GARCH(1, 1) model with a constant conditional mean term and Gaussian inno- 
vations is fitted to the historically simulated losses to estimate the volatility of the 
next day’s loss (see Section 9.2.4). 


HS-GARCH-?. A similar method to HS-GARCH but Student t innovations are 
assumed in the GARCH model. 


HS-GARCH-EVT. A similar method to HS-GARCH and HS-GARCH-t but the 
model parameters are estimated by QML and EVT is applied to the model resid- 
uals; see strategy (2) in the univariate approach to dynamic historical simulation 
described in Section 9.2.4 and see also Section 5.2.6. 


HS-MGARCH. The multivariate dynamic approach to historical simulation in 
which GARCH(1, 1) models with constant conditional mean terms are fitted to 
each time series of risk-factor changes to estimate volatilities (see Section 9.2.4). 


HS-MGARCH-EVT. A similar method to HS-MGARCH but EVT estimators 
rather than simple empirical estimators are applied to the data constructed in (9.28) 
to calculate risk-measure estimates. 


This collection of methods is of course far from complete and is merely meant as an 
indication of the kinds of strategies that are possible. In particular, we have confined 
our interest to rather simple GARCH models and not added, for example, asymmetric 
innovation distributions, leverage effects (see Section 4.2.3) or regime-switching 
models, which can often further improve the performance of such methods. 

Table 9.1 contains the VaR violation counts for estimates of the 95% and 99% 
VaR for each of the methods. The violation counts have been broken down by year 
and the final column shows the total number of violations over the eight-year period. 
In each cell a binomial test has been carried out using the score statistic (9.33), and 
test results that are significant at the 5% level are indicated by italics. 

At the 95% level the HS-MGARCH and HS-MGARCH-EVT methods clearly 
give the best overall results over the entire period; the former yields exactly the 
expected number of violations and the latter yields just one more; the third-best 
method, in terms of closeness of the number of violations to the expected number, is 
the HS-GARCH-EVT method. At the 99% level the HS-MGARCH, HS-MGARCH- 
EVT and HS-GARCH-EVT are again the best methods, although it is difficult 
to pick a favourite in terms of violation counts only. While HS-MGARCH gives 
insignificant results in every year period, the HS-MGARCH-EVT and HS-GARCH- 
EVT methods come closer to the expected number of overall violations; in fact, the 
HS-MGARCH method gives too few violations overall. 

The volatile years 2007 and 2008 are problematic for most methods, with every 
method yielding more violations than expected. However, the VC method gives an 
insignificant result in both years at the 95% level, and the HS-MGARCH method 
gives an insignificant result in both years at the 99% level. 

We now compare the results of the VaR backtests using elicitability theory. The 
results are contained in the first two columns of Table 9.2. According to this metric, 
the HS-MGARCH-EVT method gives the best VaR estimates at the 95% level, while 
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Table 9.1. Numbers of violations of the 95% and 99% VaR estimate calculated using vari- 
ous methods, as described in Section 9.3.4. Figures in italics show significant discrepancies 
between observed and expected violation counts according to a binomial score test at the 5% 
level. 


Year 2005 2006 2007 2008 2009 2010 2011 2012 All 
Trading days 258 257 258 259 258 259 258 258 2065 


Results for 95% VaR 


Expected no. 13 13 13 13 13 13 13 13 103 
of violations 

VC 8 16 17 19 13 15 14 14 116 
HS 0 6 28 49 19 6 10 1 119 
HS-GARCH 9 13 22 22 13 14 9 15 117 
HS-GARCH-t 9 14 23 22 14 15 10 15 122 
HS-GARCH-EVT 5 13 22 21 13 13 9 13 109 
HS-MGARCH 5 14 21 19 12 9 11 12 103 
HS-MGARCH-EVT 5 14 22 18 13 10 10 12 104 


Results for 99% VaR 


Expected no. 26 26 26 26 26 26 26 26 21 
of violations 

VC 2 8 8 8 2 4 5 6 43 
HS 0 0 10 22 2 0 2 (0) 36 
HS-GARCH 2 8 8 10 5 4 3 3 43 
HS-GARCH-t 2 8 6 8 1 4 2 1 32 
HS-GARCH-EVT (0) 6 4 7 1 1 2 1 22 
HS-MGARCH 0 4 4 5 0 1 2 1 17 
HS-MGARCH-EVT (0) 4 5 6 0 1 2 1 19 


the HS-MGARCH method gives the best VaR results at the 99% level. At the 95% 
level the second lowest score is given by the HS-MGARCH method. At the 99% 
level the second lowest score is given by the HS-MGARCH-EVT method and the 
third lowest score is given by the HS-GARCH-EVT method. 

It is also noticeable that the standard HS method gives very poor scores at both 
levels. As an unconditional method it is not well suited to giving estimates of quan- 
tiles of the conditional loss distribution; this is also evident from the violation counts 
in Table 9.1. In Figure 9.5 we show the second half of the year 2008: a period at the 
height of the 2007-9 credit crisis and one that includes the date on which Lehman 
Brothers filed for bankruptcy (15 September 2008). The plot shows actual losses as 
bars with risk-measure estimates for the HS and HS-MGARCH methods superim- 
posed; violations are indicated by circles (HS) and crosses (HS-MGARCH). 

Throughout the volatile year 2008, the standard historical-simulation method 
performs very poorly: there are forty-nine violations of the 95% VaR estimate and 
twenty-two violations of the 99% VaR estimate. The HS-MGARCH method, being 
a conditional method, is able to respond to the changes in volatility better and 
consequently gives nineteen and five violations. In the plot of the second half of the 
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Table 9.2. Scores based on elicitability theory for the VaR backtests (to four significant 
figures) and p-values (to two decimal places) for expected shortfall tests as described in 
Section 9.3.4; figures in italics indicate failure of the expected shortfall test. 


VaR score comparison Violation residual test 
je“ Oo 


95% VaR (x 10°) 99% VaR (x10°) 95% ES (n) 99% ES (n) 


vC 1081 308.1 0.00 116 0.05 43 
HS 1399 466.4 0.02 119 0.25 36 
HS-GARCH 1072 306.8 0.00 117 0.05 43 
HS-GARCH-t 1074 299.6 0.12 122 068 32 
HS-GARCH-EVT 1074 295.7 0.59 109 065 22 
HS-MGARCH 1064 287.8 0.99 103 0.55 17 
HS-MGARCH-EVT 1063 289.7 0.83 104 0.94 19 
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Figure 9.5. Daily losses for the second half of 2008 together with 99% VaR estimates and 
corresponding violations for the HS and HS-MGARCH methods. The HS VaR estimates 
are indicated by a dotted line and the corresponding violations are indicated by circles. The 
HS-MGARCH estimates are given by a dashed line; the only violation for this method in the 
time period occurred on 15 September 2008 (the day on which Lehman Brothers filed for 
bankruptcy) and is marked by a dashed vertical line and a crossed circle. For more information 
see Section 9.3.4. 


year we see sixteen of the twenty-two violations of the 99% VaR estimate for the 
HS method and one of the five violations for the HS-MGARCH method. 

In Figure 9.6 we address the hypothesis that VaR violations should form a 
Bernoulli trials process with geometrically distributed (or approximately exponen- 
tially distributed) spacings between violations. The figure shows a Q-Q plot of the 
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Figure 9.6. Q-Q plot of the spacings (in days) between violations of the estimates of the 
99% VaR obtained by the HS-MGARCH method against the corresponding quantiles of a 
standard exponential distribution (see Section 9.3.4). 


spacings (in days) between violations of the estimates of the 99% VaR obtained 
by the HS-MGARCH method against the corresponding quantiles of a standard 
exponential distribution; the exponential hypothesis is plausible on the basis of this 
picture. 

In Table 9.2 we also give results for the backtest of expected shortfall based on vio- 
lation residuals described in Section 9.3.2. For estimates of the 95% expected short- 
fall, three of the seven methods fail the zero-mean test for the violation residuals; the 
methods that do not fail are HS-GARCH-t, HS-GARCH-EVT, HS-MGARCH and 
HS-MGARCH-EVT. For estimates of the 99% expected shortfall, results appear to 
be even better; five of the methods give insignificant results (HS, HS-GARCH-t, 
HS-GARCH-EVT, HS-MGARCH, HS-MGARCH-EVT) while the other two give 
a marginally significant result (p = 0.05). However, at the 99% level we have only 
a small number of violation residuals to test, as indicated in the table by the column 
marked (n). 

On the basis of all results, the HS-MGARCH and HS-MGARCH-EVT methods 
give the best overall performance in the example we have considered, with the 
HS-GARCH-EVT method also performing well. 


9.3.5  Backtesting the Predictive Distribution 


As well as backtesting VaR and expected shortfall we can also devise tests that 
assess the overall quality of the estimated conditional loss distributions from which 
the risk-measure estimates are derived. Of course, our primary interest focuses 
on the measures of tail risk, but it is still useful to backtest our estimates of the 
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whole predictive distribution to obtain additional confirmation of the risk-measure 
estimation procedure. 

Suppose our objective at every time ¢ is to estimate the conditional loss distribu- 
tion Fy, ,|F, and let U;41 = F,,,\¥, (L141). For a given loss process (L;), consider 
the formation of a process (U;) by applying this transformation. As in Section 9.3.2, 
let us assume that the underlying process generating the losses (L+) satisfies, for all 
t, equations of the form L; = Ut + 0; Z;, where ur is an F;_,-measurable condi- 
tional mean term, op is an ¥;_;-measurable volatility and the (Z;) are SWN(0, 1) 
innovations. 

Under this assumption it follows easily from the fact that 


Fri |F (D = P (lty + or4i1 Ziyi SE | Fi) = Fz — beg) /or41) 


that U;41 = Gz(Z;+1), so (U;) is a strict white noise process. Moreover, if Gz is 
continuous, then the stationary or unconditional distribution of (U;) must be standard 
uniform (see Proposition 7.2). 

In actual applications we estimate FL, yF from data up to time ¢ and we back- 
test our estimates by forming Oia = = Foals (Lee) on day t + 1. Suppose we 
estimate the predictive distribution on days t = 0,...,m — 1 and form backtest- 
ing data U Lj Un; we expect these to behave like a iol of iid uniform data. 
The distibational assumption can be assessed by standard goodness-of-fit tests like 
the chi-squared test or the Kolmogorov—Smirnov test (see Section 15.1.2 for refer- 
ences). It is also possible to form the data @'(0)), Shey o'(0,), where @ is the 
standard normal df; these should behave like iid standard normal data (see again 
Proposition 7.2) and this can be tested as in Section 3.1.2. The strict white noise 
assumption can be tested using the approach described in Section 4.1.3. 


Notes and Comments 


The binomial test for numbers of VaR violations and the geometric test for the times 
between violations can be found in Kupiec (1995); in both cases, a likelihood ratio 
test is recommended. For the binomial test, alternatives are the Wald test and the 
score test (see, for example, Casella and Berger 2002, pp. 493-495). The score 
test in particular seems to give a test at about the right level for VaR probabilities 
a = 0.99 and a = 0.95 in samples of size m = 250 or m = 500 (i.e. a test with the 
right Type 1 error of falsely rejecting the null hypothesis of binomial behaviour). 
Further papers on testing VaR violations for independent Bernoulli behaviour include 
Christoffersen, Hahn and Inoue (2001) and Christoffersen and Pelletier (2004); the 
latter paper develops a number of tests of exponential behaviour for the durations 
between violations and finds that the likelihood ratio test against a Weibull alternative 
is generally most powerful for detecting clustering of violations. 

There has been a large growth in papers on backtesting, and a good overview of 
regulatory implications is given in Embrechts et al. (2014). Our backtesting material 
is partly taken from McNeil and Frey (2000), where examples of the binomial test 
for violation counts and the test of expected shortfall using exceedance residuals can 
be found. In that paper a bootstrap test of the exceedance residuals is proposed as an 
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alternative to the simple t-test used in this chapter; see Efron and Tibshirani (1994, 
p. 224) for a description of the bootstrap hypothesis test. Kerkhof and Melenberg 
(2004) describe an econometric framework for backtesting risk-based regulatory 
capital. Two interesting papers on backtesting expected shortfall are Acerbi and 
Szekely (2014) and Costanzino and Curran (2014). 

The relevance of elicitability theory to backtesting is discussed by Gneiting 
(2011), who also provides a proof that expected shortfall is not elicitable based 
on the work of Osband (1985). Further relevant papers on elicitability are Davis 
(2014), Ziegel (2015) and Bellini and Bignozzi (2013). 

The idea of testing the estimate of the predictive distribution may be found in 
Berkowitz (2001, 2002). See also Berkowitz and O’Brien (2002) for a more general 
article on testing the accuracy of the VaR models of commercial banks. Finally, for 
a critical insider view on the use of VaR technology on Wall Street in the early days, 
see the relevant chapters in Brown (2012). 
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Credit Risk 


Credit risk is the risk of a loss arising from the failure of a counterparty to honour 
its contractual obligations. This subsumes both default risk (the risk of losses due to 
the default of a borrower or a trading partner) and downgrade risk (the risk of losses 
caused by a deterioration in the credit quality of a counterparty that translates into a 
downgrading in some rating system). Credit risk is omnipresent in the portfolio of a 
typical financial institution. To begin with, the lending and corporate bond portfolios 
are obviously affected by credit risk. Perhaps less obviously, credit risk accompanies 
any over-the-counter (OTC, i.e. non-exchange-guaranteed) derivative transaction 
such as a swap, because the default of one of the parties involved may substantially 
affect the actual pay-off of the transaction. Moreover, there is a specialized market 
for credit derivatives, such as credit default swaps, in which financial institutions are 
active players. Credit risk therefore relates to the core activities of most banks. It is 
also highly relevant to insurance companies, who are exposed to substantial credit 
risk in their investment portfolios and counterparty default risk in their reinsurance 
treaties. 

The management of credit risk at financial institutions involves a range of tasks. To 
begin with, an enterprise needs to determine the capital it should hold to absorb losses 
due to credit risk, for both regulatory and economic capital purposes. It also needs 
to manage the credit risk on its balance sheet. This involves ensuring that portfolios 
of credit-risky instruments are well diversified and that portfolios are optimized 
according to risk—return considerations. The risk profile of the portfolio can also be 
improved by hedging risk concentrations with credit derivatives or by transferring 
risk to investors through securitization. Moreover, institutions need to manage their 
portfolio of traded credit derivatives. This involves the tasks of pricing, hedging and 
managing collateral for such trades. Finally, financial institutions need to control 
the counterparty credit risk in their trades and contracts with other institutions. In 
fact, in the aftermath of the 2007-9 financial crisis, counterparty risk management 
became one of the most important issues for financial institutions. 

With these tasks in mind we have split our treatment of credit risk into four 
chapters. In the present chapter we establish the foundations for the analysis of 
credit risk. We introduce the most common credit-risky instruments (Section 10.1), 
discuss various measures of credit quality (Section 10.2) and present models for the 
credit risk of a single firm (Sections 10.3—10.6). Moreover, we study basic single- 
name credit derivatives such as credit default swaps. 
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Chapters 11 and 12 are concerned with portfolio models, and the crucial issue of 
dependence between defaults comes to the fore. Chapter 11 treats one-period mod- 
els with a view to capital adequacy and credit risk management issues for portfolios 
of largely non-traded assets. Chapter 12 deals with properties of portfolio credit 
derivatives such as collateralized debt obligations; moreover, we discuss the valua- 
tion of these products in standard copula models. Finally, Chapter 17 is concerned 
with more advanced fully dynamic models of portfolio credit risk. This chapter is 
also the natural place for a detailed discussion of counterparty credit risk because a 
proper analysis requires dynamic multivariate credit risk models. 

Credit risk models can be divided into structural or firm-value models on the 
one hand and reduced-form models on the other. Broadly speaking, in a structural 
model default occurs when a stochastic variable (or, in dynamic models, a stochastic 
process), generally representing an asset value, falls below a threshold, generally 
representing liabilities. In reduced-form models the precise mechanism leading to 
default is left unspecified and the default time of a firm is modelled as a non- 
negative rv, whose distribution typically depends on economic covariables. In this 
chapter we treat structural models in Section 10.3, simple reduced-form models with 
deterministic hazard rates in Section 10.4 and more advanced reduced-form models 
in Sections 10.5 and 10.6. 


10.1 Credit-Risky Instruments 


In this section we give an overview of the universe of credit-risky instruments, 
starting with the simplest examples of loans and bonds. We include discussion of the 
counterparty credit risk in OTC derivatives trades and we also describe some of the 
more common modern credit derivative products. In what follows we often use the 
generic term obligor for the borrower, bond issuer, trading partner or counterparty 
to whom there is a credit exposure. The name stems from the fact that in all cases 
the obligor has a contractual obligation to make certain payments under certain 
conditions. 


10.1.1 Loans 


Loans are the oldest credit-risky “instruments” and come in a myriad of forms. It 
is common to categorize them according to the type of obligor into retail loans 
(to individuals and small or medium-sized companies), corporate loans (to larger 
companies), interbank loans and sovereign loans (to governments). In each of these 
categories there are likely to be anumber of different lending products. For example, 
retail customers may borrow money from a bank using mortgages against property, 
credit cards and overdrafts. 

The common feature of most loans is that a sum of money, known as the principal, 
is advanced to the borrower for a particular term in exchange for a series of defined 
interest payments, which may be at fixed or floating interest rates. At the end of the 
term the borrower is required to pay back the principal. 

A useful distinction to make is between secured and unsecured lending. If a loan 
is secured, the borrower has pledged an asset as collateral for the loan. A prime 
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example is a mortgage, where the collateral is a property. In the event that the 
borrower is unable to fulfill its obligation to make interest payments or repay the 
principal, a situation that is termed default, the lender may take possession of the 
asset. In this way the loss in the event of default may be partly mitigated and money 
may be recovered by selling the asset. In an unsecured loan the lender has no such 
claim on a collateral asset and recoveries in the event of default may be a smaller 
fraction of the so-called exposure, which is the value of the outstanding principal 
and interest payments. 

Unlike bonds, which are publicly traded securities, loans are private agreements 
between the borrower and the lender. Hence there is a wide variety of different 
loan contracts with different legal features. This makes loans difficult to value 
under fair-value principles. Book value is commonly used, and where fair-value 
approaches are applied these mostly fall under the heading of level 3 valuation (see 
Section 2.2.2). 


10.1.2 Bonds 


Bonds are publicly traded securities issued by companies and governments that 
allow the issuer to raise funding on financial markets. Bonds issued by companies are 
called corporate bonds and bonds issued by governments are known as treasuries, 
sovereign bonds or, particularly in the UK, gilts (gilt-edged securities). 

The structure of the pay-offs is akin to that of a loan. The security commits the 
bond issuer (borrower) to make a series of interest payments to the bond buyer 
(lender) and pay back the principal at a fixed maturity. The interest payments, or 
coupons, may be fixed at the issuance of the bond (so-called fixed-coupon bonds). 
Alternatively, there are also bonds where the interest payments vary with market rates 
(so-called floating-rate notes). The reference for the floating rate is often LIBOR 
(the London Interbank Offered Rate). There are also convertible bonds, which allow 
the purchaser to convert them into shares of the issuing company at predetermined 
time points. These typically offer lower rates than conventional corporate bonds 
because the investor is being offered the option to participate in the future growth 
of the company. 

A bondholder is subject to a number of risks, particularly interest-rate risk, 
spread risk and default risk. As for loans, default risk is the risk that promised 
coupon and principal payments are not made. Historically, government bonds 
issued by developed countries have been considered to be default free; for obvi- 
ous reasons, after the European debt crisis of 2010-12, this notion was called into 
question. 

Spread risk is a form of market risk that refers to changes in credit spreads. The 
credit spread of a defaultable bond measures the difference in the yield of the bond 
and the yield of an equivalent default-free bond (see Section 10.3.2 for a formal 
definition of credit spreads). An increase in the spread of a bond means that the 
market value of the bond falls, which is generally interpreted as indicating that the 
financial markets perceive an increased default risk for the bond. 
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10.1.3 Derivative Contracts Subject to Counterparty Risk 


A significant proportion of all derivative transactions is carried out over the counter, 
and there is no central clearing counterparty such as an organized exchange that 
guarantees the fulfilment of the contractual obligations. These trades are therefore 
subject to the risk that one of the contracting parties defaults during the transaction, 
thus affecting the cash flows that are actually received by the other party. This risk, 
known as counterparty credit risk, received a lot of attention during the financial cri- 
sis of 2007-9, as some of the institutions heavily involved in derivative transactions 
experienced worsening credit quality or—in the case of Lehman Brothers—even 
a default event. Counterparty credit risk management is now a key issue for all 
financial institutions and is the focus of many new regulatory developments. 

In order to illustrate the challenges in measuring and managing counterparty credit 
risk, we consider the example of an interest swap. This is a contract where two parties 
A and B agree to exchange a series of interest payments on a given nominal amount 
of money for a given period. For concreteness assume that A receives payments at 
a fixed interest rate and makes floating payments at a rate equal to the three-month 
LIBOR. 

Suppose now that A defaults at time ta before the maturity of the contract, so 
that the contract is settled at that date. The consequences will depend on the value 
of the remaining interest payments at that point in time. If interest rates have risen 
relative to their value at inception of the contract, the fixed interest payments have 
decreased in value so that the value of the swap contract has increased for B. Since 
A is no longer able to fulfill its obligations, its default constitutes a loss for B; the 
exact size of the loss will depend on the term structure of interest rates at the default 
time Ta. On the other hand, if interest rates have fallen relative to their value at 
t = 0, the fixed payments have increased in value so that the swap has a negative 
value for B. At settlement, B will still have to pay the value of the contract into the 
bankruptcy pool, so that there is no upside for B in A’s default. If B defaults first, 
the situation is reversed: falling rates lead to a counterparty-risk-related loss for A. 
This simple example illustrates two important points: the size of the counterparty 
credit exposure is not known a priori, and it is not even clear who has the credit 
exposure. 

The management of counterparty risk raises a number of issues. First, counter- 
party risk has to be taken into account in pricing and valuation. This has led to 
various forms of credit value adjustment (CVA). Second, counterparty risk needs to 
be controlled using risk-mitigation techniques such as netting and collateralization. 
Under a netting agreement, the value of all derivatives transactions between A and 
B is computed and only the aggregated value is subject to counterparty risk; since 
offsetting transactions cancel each other out, this has the potential to reduce coun- 
terparty risk substantially. Under a collateralization agreement, the parties exchange 
collateral (cash and securities) that serve as a pledge for the receiver. The value of the 
collateral is adjusted dynamically to reflect changes in the value of the underlying 
transactions. 
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Premium payments until default or maturity 
Yes: default i 
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Default of C occurs? 


No: no payment 


Figure 10.1. The basic structure of a CDS. Firm C is the reference entity, 
firm A is the protection buyer, and firm B is the protection seller. 


The proper assessment of counterparty risk requires a joint modelling of the 
default times of the two counterparties (A and B) and of the price dynamics of the 
underlying derivative contract. For that reason we defer the detailed discussion of 
this topic to Chapter 17. 


10.1.4 Credit Default Swaps and Related Credit Derivatives 


Credit derivatives are securities that are primarily used for the hedging and trading 
of credit risk. In contrast to the products considered so far, the promised pay-off 
of a credit derivative is related to credit events affecting one or more firms. Major 
participants in the market for credit derivatives are banks, insurance companies and 
investment funds. Retail banks are typically net buyers of protection against credit 
events; other investors such as hedge funds and investment banks often act as both 
sellers and buyers of credit protection. 


Credit default swaps. Credit default swaps (CDSs) are the workhorses of the credit 
derivatives market, and the market for CDSs written on larger corporations is fairly 
liquid; some numbers on the size of the market are given in Notes and Comments. 
The basic structure of a CDS is depicted in Figure 10.1. A CDS is a contract between 
two parties, the protection buyer and the protection seller. The pay-offs are related 
to the default of a reference entity (a financial firm or sovereign issuing bonds). 

If the reference entity experiences a default event before the maturity date T of 
the contract, the protection seller makes a default payment to the protection buyer, 
which mimics the loss due to the default of a bond issued by the reference entity 
(the reference asset); this part of a CDS is called the default payment leg. In this 
way the protection buyer has acquired financial protection against the loss on the 
reference asset he would incur in case of a default. As compensation, the protection 
buyer makes periodic premium payments (typically quarterly or semiannually) to 
the protection seller (the premium payment leg); after the default of the reference 
entity, premium payments stop. There is no initial payment. The premium payments 
are quoted in the form of an annualized percentage x* of the notional value of 
the reference asset; x* is termed the (fair or market-quoted) CDS spread. For a 
mathematical description of the payments, see Section 10.4.4. 
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There are a number of technical and legal issues in the specification of a CDS. In 
particular, the parties have to agree on the precise definition of a default event and 
on a procedure to determine the size of the default payment in case a default event 
of the reference entity occurs. Due to the efforts of bodies such as the International 
Swaps and Derivatives Association (ISDA), some standardization of these issues 
has taken place. 

Investors enter into CDS contracts for various reasons. To begin with, bond 
investors with a large credit exposure to the reference entity may buy CDS pro- 
tection to insure themselves against losses due to the default of a bond. This may 
be easier than reducing the original bond position because CDS markets are often 
more liquid than bond markets. Moreover, CDS positions are quickly settled. 

CDS contracts are also held for speculative reasons. In particular, so-called naked 
CDS positions, where the protection buyer does not own the bond, are often assumed 
by investors who are speculating on the widening of the credit spread of the reference 
entity. These positions are similar to short-selling bonds issued by the reference 
entity. Note that, in contrast to insurance, there is no requirement for the protection 
buyer to have insurable interest, that is, to actually own a bond issued by the reference 
entity. The speculative motive for holding CDSs is at least as important as the 
insurance motive. 

There has been some debate about the risks of the CDS market, particularly with 
respect to the large volume of naked positions and whether or not these should 
be limited. By taking naked CDS positions speculators can depress the prices of 
the bonds issued by the reference entity so that default becomes a self-fulfilling 
prophecy. The debate about the pros and cons of limiting naked CDS positions is 
akin to the debate about the pros and cons of limiting short selling on equity markets. 

A CDS is traded over the counter and is not guaranteed by a clearing house. 
A CDS position can therefore be subject to a substantial amount of counterparty 
risk, particularly if a trade is backed by insufficient collateral. A case in point arose 
during the credit crisis when AIG, which had sold many protection positions, had 
to be bailed out by the US government to prevent the systemic consequences of 
allowing it to default on its CDS contracts. There is concern that CDS markets have 
created a new form of dependency across financial institutions so that the default 
of one large (systemically important) institution could create a cascade of defaults 
across the financial sector due to counterparty risk. 

On the other hand, CDSs are useful risk-management tools. Because of the liq- 
uidity of CDS markets, CDSs are the natural underlying security for many more 
complex credit derivatives. Models for pricing portfolio-related credit derivatives 
are usually calibrated to quoted CDS spreads. With improved collateral manage- 
ment in CDS markets it has been argued that the potential for CDS markets to 
create large-scale default contagion has been substantially reduced (see Notes and 
Comments). 


Credit-linked notes. A credit-linked note is a combination of a credit derivative and 
a coupon bond that is sold as a fixed package. The coupon payments (and sometimes 
also the repayment of the principal) are reduced if a third party (the reference entity) 


372 10. Credit Risk 


experiences a default event during the lifetime of the contract, so the buyer of a 
credit-linked note is providing credit protection to the issuer of the note. 

Credit-linked notes are issued essentially for two reasons. First, from a legal point 
of view, a credit-linked note is treated as a fixed-income investment, so that investors 
who are unable to enter into a transaction involving credit derivatives directly (such 
as life insurance companies) may nonetheless sell credit protection by buying credit- 
linked notes. Second, an investor buying a credit-linked note pays the price up front, 
so that the protection buyer (the issuer of the credit-linked note) is protected against 
losses caused by the default of the protection seller. 


10.1.5 PD, LGD and EAD 


Regardless of whether we make a loan, buy a defaultable bond, engage in an OTC 
derivatives transaction, or act as protection seller in a CDS, the risk of a credit loss 
is affected by three, generally related, quantities: the exposure at default (EAD), the 
probability of default (PD) and the loss given default (LGD) or, equivalently, the 
size of the recovery in the event of default. They are key inputs to the Basel formula 
in the internal-ratings-based (IRB) approach to determining capital requirements 
for credit-risky portfolios, so it is important to consider them. 


Exposure at default. If we make a loan or buy a bond, our exposure is relatively 
easy to determine, since it is mainly the principal that is at stake. However, there is 
some additional uncertainty about the value of the interest payments that could be 
lost. A further source of exposure uncertainty is due to the widespread use of credit 
lines. Essentially, a credit line is a ceiling up to which a corporate client can borrow 
money at given terms, and it is up to the borrower to decide which part of the credit 
line he actually wants to use. For OTC derivatives, the counterparty risk exposure 
is even more difficult to quantify, since it is a stochastic variable depending on the 
unknown time at which a counterparty defaults and the evolution of the value of the 
derivative up to that point; a case in point is the example of an interest rate swap 
discussed in Subsection 10.1.3. 

In practice, the concept that is used to describe exposure is exposure at default or 
EAD, which recognizes that the exposure for many instruments will depend on the 
exact default time. In counterparty credit risk the use of collateral can also reduce 
the exposure and thus mitigate losses. 


Probability of default. When measuring the risk of losses over a fixed time hori- 
zon, e.g. one year, we are particularly concerned with estimating the probability 
that obligors default by the time horizon, a quantity known to practitioners as the 
probability of default, or PD. The PD is related to the credit quality of an obligor, 
and Sections 10.2 and 10.3 discuss some of the models that are used to quantify 
default risk. For instruments where the loss is dependent on the exact timing of 
default, e.g. OTC derivatives with counterparty risk, the risk of default is described 
by the whole distribution of possible default times and not just the probability of 
default by a fixed horizon. 
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Loss given default. In the event of default, it is unlikely that the entire exposure 
is lost. For example, when a mortgage holder defaults on a residential mortgage, 
and there is no realistic possibility of restructuring the debt, the lender can sell the 
property (the collateral asset) and the proceeds from the sale will make good some 
of the lost principal. Similarly, when a bond issuer goes into administration, the 
bondholders join the group of creditors who will be partly recompensed for their 
losses by the sale of the firm’s assets. 

Practitioners use the term loss given default, or LGD, to describe the proportion 
of the exposure that is actually lost in the event of default, or its converse, the 
recovery, to describe the amount of the exposure that can be recovered through debt 
restructuring and asset sales. 


Dependence of these quantities. It is important to realize that EAD, PD and LGD 
are dependent quantities. While it is common to attempt to model them in terms of 
independent random variables, it is unrealistic to do so. For example, in a period 
of financial distress, when PDs are high, the asset values of firms are depressed 
and firms are defaulting, recoveries are likely to be correspondingly low, so that 
there is positive dependence between PDs and LGDs. This will be discussed further 
in 11.2.3. 


Notes and Comments 


For further reading on loans and loan pricing we refer to Benzschawel, Dagraca and 
Fok (2010). For an overview of bonds see Sharpe, Alexander and Bailey (1999). 

To get an idea of the size of the CDS market, note that the nominal value (gross 
notional amount) of the market stood at approximately $60 trillion by the end of 
2007, before coming down to a still considerable amount of approximately $25 tril- 
lion by the end of 2012. In 2013 the net notional amount was of the order of $2 tril- 
lion. For comparison, by the end of 2012 world GDP stood at roughly $80 trillion. 
To give an example of the size of the speculative market in CDSs, Cont (2010) 
reports that “when it filed for bankruptcy on September 14, 2008, Lehman Brothers 
had $155 billion of outstanding debt, but more than $400 billion notional value 
of CDS contracts had been written with Lehman as reference entity”. A good dis- 
cussion of the role of such credit derivatives in the credit crisis is given in Stulz 
(2010). The effect of improved collateral management for CDSs on the risk of large- 
scale contagion in CDS markets is addressed in Brunnermeier, Clerc and Scheicher 
(2013). 

In this brief introduction we have discussed a few essential features of credit 
derivatives but have omitted the rather involved regulatory, legal and accounting 
issues related to these instruments. Readers interested in these topics are referred 
to the paper collections edited by Gregory (2003) and Perraudin (2004), in which 
pricing issues are also discussed. An excellent treatment of credit derivatives at 
textbook level is Schonbucher (2003). For a discussion of credit derivatives from 
the viewpoint of financial engineering we refer to Neftci (2008). 
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10.2 Measuring Credit Quality 


There are various ways of quantifying the credit quality or, equivalently, the default 
risk of obligors but, broadly speaking, these approaches may be divided into two 
philosophies. On the one hand, credit quality can be described by a credit rating or 
credit score that is based on empirical data describing the borrowing and repayment 
history of the obligor, or of similar obligors. On the other hand, for obligors whose 
equity is traded on financial markets, prices can be used to infer the market’s view 
of the credit quality of the obligor. This section is devoted to the first philosophy, 
and market-implied measures of credit quality are treated in the context of structural 
models in Section 10.3. 

Credit ratings and credit scores fulfill a similar function—they both allow us to 
order obligors according to their credit risk and map that risk to an estimate of 
default probability. Credit ratings tend to be expressed on an ordered categorical 
scale, whereas credit scores are often expressed in terms of points on a metric 
scale. The task of rating obligors, particularly large corporates or sovereigns, is 
often outsourced to a rating agency such as Moody’s or Standard & Poor’s (S&P); 
proprietary rating systems internal to a financial institution can also be used. In the 
S&P rating system there are seven pre-default rating categories, labelled AAA, AA, 
A, BBB, BB, B, CCC, with AAA being the highest rating and CCC the lowest rating; 
Moody’s uses nine pre-default rating categories and these are labelled Aaa, Aa, A, 
Baa, Ba, B, Caa, Ca, C. A finer alpha-numeric system is also used by both agencies. 

Credit scores are traditionally used for retail customers and are based on so-called 
scorecards that banks develop through extensive statistical analyses of historical 
data. The basic idea is that default risk is modelled as a function of demographic, 
behavioural and financial covariates that describe the obligor. Using techniques such 
as logistic regression these covariates are weighted and combined into a score. 


10.2.1 Credit Rating Migration 


In the credit-migration approach each firm is assigned to a credit-rating category at 
any given time point. The probability of moving from one credit rating to another over 
a given risk horizon (typically one year) is then specified. Transition probabilities are 
typically presented in the form of a matrix; an example from Moody’s is presented 
in Table 10.1. Transition matrices are estimated from historical default data, and 
standard statistical methods used for this purpose are discussed in Section 10.2.2 
In the credit-migration approach we assume that the current credit rating com- 
pletely determines the default probability, so that this probability can be read from the 
transition matrix. For instance, if we use the transition matrix presented in Table 10.1, 
we obtain a one-year default probability for an A-rated company of 0.06%, whereas 
the default probability of a Caa-rated company is 13.3%. In practice, a correction 
to the figures in Table 10.1 would probably be undertaken to account for rating 
withdrawals: that is, transitions to the WR state. The simplest correction would be 
to divide the first nine probabilities in each row of the table by one minus the final 
probability in that row; this implicitly assumes that the act of rating withdrawal 
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Table 10.1. Probabilities of migrating from one rating quality to another within one year. 
“WR” represents the proportion of firms that were no longer rated at the end of the year, for 
various reasons including takeover by another company. Source: Ou (2013, Exhibit 26). 


Rating at year-end (%) 
hitid AJA TT 
rating Aaa Aa A Baa Ba B Caa Ca-C Default WR 


Aaa 87.20 8.20 0.63 0.00 0.03 0.00 0.00 0.00 0.00 3.93 
Aa 0.91 8457 843 049 0.06 0.02 0.01 0.00 0.02 5.48 
A 0.06 2.48 86.07 5.47 057 0.11 0.03 0.00 0.06 5.13 
Baa 0.039 0.17 4.11 8484 4.05 7.55 1.63 0.02 0.17 5.65 
Ba 0.01 0.05 0.35 5.52 75.75 7.22 0.58 0.07 1.06 9.39 
B 0.01 0.03 0.11 0.32 4.58 73.53 5.81 0.59 3.85 11.16 
Caa 0.01 0.02 0.02 0.12 0.38 8.70 61.71 3.72 13.34 12.00 
Ca-C 0.00 0.00 0.00 0.00 040 2.03 9.38 35.46 37.93 14.80 


Table 10.2. Average cumulative default rates (%). Source: Ou (2013, Exhibit 33). 


Term 

Initial a HMM 

rating 1 2 3 4 5 10 15 
Aaa 0.00 0.01 0.01 0.04 0.11 0.50 0.93 
Aa 0.02 0.07 0.14 0.26 0.38 0.92 1.75 
A 0.06 0.20 0.41 0.63 0.87 2.48 4.26 
Baa 0.18 0.50 0.89 1.37 1.88 4.70 8.62 
Ba 1.11 3.08 5.42 7.93 10.18 19.70 29.17 
B 4.05 9.60 15.22 20.13 24.61 41.94 52.22 


Caa-C 16.45 27.87 36.91 44.13 50.37 69.48 79.18 


contains no information about the likelihood of upgrade, downgrade or default of 
an obligor. 

Rating agencies also produce cumulative default probabilities over larger time 
horizons. In Table 10.2 we reproduce Moody’s cumulative default probabilities for 
companies with a given current credit rating. For instance, according to this table 
the probability that a company whose current credit rating is Baa defaults within the 
next four years is 1.37%. These cumulative default probabilities have been estimated 
directly from default data. Alternative estimates of multi-year default probabilities 
can be inferred from one-year transition matrices, as explained in more detail in the 
next section. 


Remark 10.1 (accounting for business cycles). It is a well-established empirical 
fact that default rates tend to vary with the state of the economy, being high during 
recessions and low during periods of economic expansion (see Figure 10.2 for an 
illustration). Transition rates as estimated by S&P or Moody’s, on the other hand, 
are historical averages over longer time horizons covering several business cycles. 
For instance, the transition rates in Table 10.1 have been estimated from rating- 
migration data over the period 1970-2012. Moreover, rating agencies focus on the 
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Figure 10.2. Moody’s annual default rates from 1920 to 2012. 
Source for data: Ou (2013, Exhibit 30). 


average credit quality “through the business cycle” when attributing a credit rating 
to a particular firm. The default probabilities from the credit-migration approach are 
therefore estimates for the average default probability, independent of the current 
economic environment. In some situations we are interested in “point-in-time”’ esti- 
mates of default probabilities reflecting the current macroeconomic environment, 
such as in the pricing of a short-term loan. In these situations adjustments to the 
long-term average default probabilities from the credit-migration approach can be 
made; for instance, we could use equity prices as an additional source of informa- 
tion, as is done in the public-firm EDF (expected default frequency) model discussed 
in Section 10.3.3. 


10.2.2 Rating Transitions as a Markov Chain 


Let (R;) denote a discrete-time stochastic process defined at times t = 0, 1,... that 
takes values in S = {0,1,...,}. The set S defines rating states of increasing cred- 
itworthiness, with 0 representing default. (R;) models the evolution of an obligor’s 
rating over time. 

We will assume that (R;) is a Markov chain. This means that conditional transition 
probabilities satisfy the Markov property 


P(R; = k | Ro = ro, Ri =r71,..., Re-1 = j) = PR, =k | R-1 = j) 


forallt > 1 andall j, ro, r1, ..., 11-2, k € S. In words, the conditional probabilities 
of rating transitions, given an obligor’s rating history, depend only on the previous 
rating R;-; = j at the last time point and not on the more distant history of how the 
obligor arrived at a rating state j at time t — 1. 
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The Markov assumption for rating migrations has been criticized; there is evi- 
dence for both momentum and stickiness in empirical rating histories (see Lando and 
Skodeberg 2002). Momentum is the phenomenon by which obligors who have been 
recently downgraded to a particular rating are more likely to experience further 
downgrades than obligors who have had the same rating for a long time. Sticki- 
ness is the converse phenomenon by which rating agencies are initially hesitant to 
downgrade obligors until the evidence for credit deterioration is overwhelming. But 
despite these issues, the Markov chain assumption is very widely made, because it 
leads to tractable models with a well-understood theory and to natural estimators 
for transition probabilities. 

The Markov chain is stationary if 


P(R; =k | Ri-1 = j) = P(Ri =k | Ro = j) 


for all t > 1 and all rating states j and k. In this case we can define the transition 
matrix to be the (n + 1) x (n + 1) matrix P = (pjp) with elements pjk = P(R; = 
k | Ri-1 = j) for any t > 1. Simple conditional probability arguments can be used 
to derive the Chapman—Kolmogorov equations, which say that for any t > 2, and 
any j,k € S, 


P(R; =k | Ra = j) = D> P(R = k | Re =D P(R-1 = 1 | R2 = j) 
leS 


= 5 PikP ji- 


les 


An implication of this is that the matrix of transition probabilities over two time 
steps is given by P? = P x P. Similarly, the matrix of transition probabilities over 
T time periods is P7. It is, however, not clear how we would compute a matrix 
of transition probabilities for a fraction of a time period. In fact, this requires the 
notion of a Markov chain in continuous time, which is discussed below. 

We now turn to the problem of estimating P. Suppose we observe, or are given 
information about, the ratings of companies at the time points 0,1,..., T. This 
information usually relates to a fluctuating population or cohort of companies, with 
only a few having complete rating histories throughout [0, T]: new companies may 
be added to the cohort at any time; some companies may default and leave the cohort; 
others may have their rating withdrawn. In the latter case we will assume that the 
withdrawal of rating occurs independently of the default or rating-migration risk of 
the company (which may not be true). 

Fort = 0,..., T — land j € S\ {0}, let Nz; denote the number of companies that 
are rated j at time ¢ and for which a rating is available at time t + 1; let N;j, denote 
the subset of those companies that are rated k at time t + 1. Under the discrete- 
time, homogeneous Markovian assumption, independent multinomial experiments 
effectively take place at each time ¢. In each experiment the N;; companies rated 
j can be thought of as being randomly allocated to the ratings k € S according to 
probabilities p jg that satisfy )°;_9 Pjk = 1. 
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In this framework the likelihood is given by 


T-1 n n 
P; 
L((Pjk); (Nj), (Ntjk)) = gi (TI (Na I] AY), 
tjk: 


t=0 `j=1 k=0 


and if this is maximized subject to the constraints that )°;_ pjk = 1 for j = 


1,...,n, we obtain the maximum likelihood estimator 
5 lN k 
TE Xio Nijk =o Ny (10.1) 
De =0 | N tj 


There are a number of drawbacks to modelling rating transitions as a discrete- 
time Markov chain. In practice, rating changes tend to take place on irregularly 
spaced dates. While such data can be approximated by a regularly spaced time 
series (or panel) of, say, yearly, quarterly or monthly ratings, there will be a loss 
of information in doing so. The discrete-time model described above would ignore 
any information about intermediate transitions taking place between two times t 
and ¢t + 1. For example, if an obligor is downgraded from A to BBB to BB over the 
course of the period [f, t + 1], this obligor will simply be recorded as migrating from 
A to BB and the information about transitions from A to BBB and BBB to BB will 
not be recorded. Moreover, the estimation procedure for a discrete-time chain tends 
to result in sparse estimates of transition matrices with quite a lot of zero entries. 
For example, if no transitions between AAA and default within a single time period 
are observed, then the probability of such a transition will be estimated to be zero. 
However, in reality such a transition is possible, if unlikely, and so its estimated 
probability of occurrence should not be zero. 

It is thus more satisfactory to model rating transitions as a phenomenon in con- 
tinuous time. In this case, transition probabilities are not modelled directly but are 
instead given in terms of transition rates. Intuitively, the relationship between tran- 
sition rates and transition probabilities can be described as follows. 

Assume that over any small time step of duration ôt the probability of a transition 
from rating j to k is given approximately by A ;,6t for some constant À jz > 0, which 
is the transition rate between rating j and rating k. The probability of staying at rating 
j is given by 1 — pa jÀjkôt. If we define a matrix A to have off-diagonal entries 
Ajx and diagonal entries — pees Àjk, We can summarize the implied transition 
probabilities for the small time step ôt in the matrix (/,4 + Adt). We now consider 
transitions in the period [0, +t] and denote the corresponding matrix of transition 
probabilities by P(t). If we divide the time period into N small time steps of size 
ôt = t/N for N large, the matrix of transition probabilities can be approximated by 


Bs At 
= (: n+l + j; ’ 


which converges, as N — oo, to the so-called matrix exponential of At: 


P(t) =e 
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This formulation gives us a method of computing transition probabilities for any 
time horizon ź in terms of the matrix A, the so-called generator matrix. 

A Markov chain in continuous time with generator matrix A can be constructed 
in the following way. An obligor remains in rating state j for an exponentially 
distributed amount of time with parameter 


Ajj = Ss À jk, 


k£j 


i.e. minus the diagonal element of the generator matrix. When a transition takes place 
the new rating is determined by a multinomial experiment in which the probability 
of a transition from state j to state k is given by A jķ/Àjj. 

This construction also leads to natural estimators for the matrix A. Since À jg is 
the instantaneous rate of migrating from j to k, we can estimate it by 


? Nix(T) 


re DI 10.2 
ut fo yar on 


where Njx(7) is the total number of observed transitions from j to k over the 
time period [0, T] and Y; (t) is the number of obligors with rating j at time t; the 
denominator therefore represents the total time spent in state j by all the companies 
in the data set. Note that this is the continuous-time analogue of the maximum 
likelihood estimator in (10.1); it is not surprising, therefore, that (10.2) can be 
shown to be the maximum likelihood estimator for the transition intensities of a 
homogenous continuous-time Markov chain. 


Notes and Comments 


There is a large literature on credit scoring, and useful starter references are Thomas 
(2009) and Hand and Henley (1997). In addition to the well-known commercial rat- 
ing agencies there are now open rating systems. One example is the Credit Research 
Initiative at the Risk Management Institute of the National University of Singapore 
(see www.rmicri.org). 

An alternative discussion of models based on rating migration is given in Chap- 
ters 7 and 8 of Crouhy, Galai and Mark (2001). Statistical approaches to the estima- 
tion of rating-transition matrices are discussed in Hu, Kiesel and Perraudin (2002) 
and Lando and Skodeberg (2002). The latter paper also shows that there is momen- 
tum in rating-transition data, which contradicts the assumption that rating transitions 
form a Markov chain. An example of an industry model based on credit ratings is 
CreditMetrics: see RiskMetrics Group (1997). 

The literature on the statistical properties of rating transitions is surveyed exten- 
sively in Chapter 4 of Duffie and Singleton (2003). The maximum likelihood esti- 
mator of the infinitesimal generator of a continuous-time Markov chain is formally 
derived in Albert (1962). For further information on Markov chains we refer to the 
standard textbook by Norris (1997). 
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10.3 Structural Models of Default 


In structural or firm-value models of default one postulates a mechanism for the 
default of a firm in terms of the relationship between its assets and liabilities. Typi- 
cally, default occurs whenever a stochastic variable (or in dynamic models a stochas- 
tic process) generally representing an asset value falls below a threshold representing 
liabilities. The kind of thinking embodied in these models has been very influen- 
tial in the analysis of credit risk and in the development of industry solutions, so 
that this is a natural starting point for a discussion of credit risk models. We begin 
with a detailed analysis of the seminal model of Merton (1974) (in Sections 10.3.1 
and 10.3.2). Industry implementations of structural models are discussed in Sec- 
tion 10.3.3. 

From now on we denote a generic stochastic process in continuous time by (X;); 
the value of the process at time t > 0 is given by the rv X;. 


10.3.1 The Merton Model 


The model proposed in Merton (1974) is the prototype of all firm-value models. 
Consider a firm whose asset value follows some stochastic process (V;). The firm 
finances itself by equity (i.e. by issuing shares) and by debt. In Merton’s model, debt 
consists of zero-coupon bonds with common maturity 7; the nominal value of debt 
at maturity is given by the constant B. Moreover, it is assumed that the firm cannot 
pay out dividends or issue new debt. 

The values at time ¢ of equity and debt are denoted by S, and B,. Default occurs if 
the firm misses a payment to its debtholders, which in the Merton model can occur 
only at the maturity T of the bonds. At T we have to distinguish between two cases. 


(i) Vr > B: the value of the firm’s assets exceeds the nominal value of the 
liabilities. In that case the debtholders (the owners of the zero-coupon bonds) 
receive B, the shareholders receive the residual value Sy = Vr — B, and there 
is no default. 


(ii) Vr < B: the value of the firm’s assets is less than its liabilities and the firm 
cannot meet its financial obligations. In that case shareholders have no interest 
in providing new equity capital, as these funds would go immediately to the 
bondholders. They therefore let the firm go into default. Control over the firm’s 
assets is passed on to the bondholders, who liquidate the firm and distribute 
the proceeds among themselves. Shareholders pay and receive nothing, so 
that we have Br = Vr, Sr = 0. 


Summarizing, we have the relationships 


Sr = max(Vr — B,0) = (Vr — B)*, (10.3) 
Br = min(V7, B) = B — (B — Vr)”. (10.4) 


Equation (10.3) implies that the value of the firm’s equity at time T equals the pay- 
off of a European call option on Vr, while (10.4) implies that the value of the firm’s 
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debt at maturity equals the nominal value of the liabilities minus the pay-off of a 
European put option on Vr with exercise price equal to B. 

This model is of course a stylized description of default. In reality, the structure of a 
company’s debt is much more complex, so that default can occur on many different 
dates. Moreover, under modern bankruptcy code, default does not automatically 
imply bankruptcy, i.e. liquidation of a firm. Nonetheless, Merton’s model is a useful 
starting point for modelling credit risk and for pricing securities subject to default. 


Remark 10.2. The option interpretation of equity and debt is useful in explain- 
ing potential conflicts of interest between the shareholders and debtholders of a 
company. It is well known that, all other things being equal, the value of an option 
increases if the volatility of the underlying security is increased. Shareholders there- 
fore have an interest in the firm taking on risky projects. Bondholders, on the other 
hand, have a short position in a put option on the firm’s assets and would therefore 
like to see the volatility of the asset value reduced. 


In the Merton model it is assumed that under the real-world or physical probability 
measure P the process (V;) follows a diffusion model (known as the Black-Scholes 
model or geometric Brownian motion) of the form 


dV; = uy Vi dt + oy V, dW; (10.5) 


for constants wy € R (the drift of the asset value process), oy > 0 (the volatility 
of the asset value process), and a standard Brownian motion (W;). Equation (10.5) 
can be solved explicitly, and it can be shown that 


Vr = Voexp((uy — Lay)T + oy Wr). 


Since Wr ~ N (0, T), it follows that In Vr ~ N (ln Vo + (uy — son)T, oyT). 
Under the dynamics (10.5), the default probability of the firm is readily computed. 
We have 


In(B/Vo) — (uv — 30%)T 
oy vT 

It may be deduced from (10.6) that the default probability is increasing in B, decreas- 

ingin Vo and uy and, for Vo > B, increasing in øy. All these properties are perfectly 

in line with economic intuition. 

Figure 10.3 shows two simulated trajectories for the asset value process (V,) for 
values Vp = 1, uy = 0.03 and oy = 0.25. Assuming that B = 0.85 and T = 1, one 
path is a default path, terminating at a value Vr < B, while the other is a non-default 
path. 


P(Vr < B) = P(InVr < In B) = o( ji (10.6) 


10.3.2 Pricing in Merton’s Model 


In the context of Merton’s model one can price securities whose pay-off depends 
on the value Vr of the firm’s assets at T. Prime examples are the firm’s debt, or, 
equivalently, the zero-coupon bonds issued by the firm, and the firm’s equity. In our 
analysis of pricing in the context of the Merton model we make use of a few basic 
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Figure 10.3. Illustration of (a) a default path and (b) a non-default path in Merton’s model. 
The solid lines show simulated one-year trajectories for the asset value process (V;) starting 
at Vo = 1 with parameters uy = 0.03 and oy = 0.25. Assuming that the debt has face value 
B = 0.85 and maturity T = 1 and that the interest rate is r = 0.02, the dotted curve shows 
the value of default-free debt (Bpo(t, T)) while the dashed line shows the evolution of the 
company’s debt B; according to formula (10.12). The difference between the asset value V; 
and the debt B; is the value of equity S;. 
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concepts from financial mathematics and stochastic calculus; references to useful 
texts in financial mathematics are given in Notes and Comments. 
We make the following assumptions. 


Assumption 10.3. 


(i) The risk-free interest rate is deterministic and equal tor > 0. 


(ii) The firm’s asset-value process (V;) is independent of the way the firm is 
financed, and in particular it is independent of the debt level B. 


(iii) The asset value (V;) can be traded on a frictionless market, and the asset-value 
dynamics are given by the geometric Brownian motion (10.5). 


These assumptions merit some comments. First, the independence of (V;) from 
the financial structure of the firm is questionable, because a very high debt level, 
and hence a high default probability, may adversely affect the ability of a firm to 
generate business, hence affecting the value of its assets. This is a special case of 
the indirect bankruptcy costs discussed in Section 1.4.2. Second, while there are 
many firms with traded equity, the value of the assets of a firm is usually neither 
completely observable nor traded. We come back to this issue in Section 10.3.3. For 
an example where (iii) holds, think of an investment company or trust that invests in 
liquidly traded securities and uses debt financing to leverage its position. In that case 
V, corresponds to the value of the investment portfolio at time f, and this portfolio 
consists of traded securities by assumption. 


Pricing of equity and debt. Consider a claim on the asset value of the firm with 
maturity T and pay-off h (Vr), such as the firm’s equity and debt in (10.3) and (10.4). 
Under Assumption 10.3, the fair value f(t, V;) of this claim at time t < T can be 
computed using the risk-neutral pricing rule as the expectation of the discounted 
pay-off under the risk-neutral measure Q, that is, 


f(t, Vi) = E2(e TF h(Vr) | Fi). (10.7) 


According to (10.3), the firm’s equity corresponds to a European call on (V;) 
with exercise price B and maturity T. The risk-neutral value of equity obtained 
from (10.7) is therefore given simply by the Black-Scholes price CBS of a Euro- 
pean call. This yields 


Sı = CPS, Vis r, ov, B, T) = VP (di1) — BET 9 G(d,2), (10.8) 
where the arguments are given by 
In V; — In B + (r + 402)(T — t) 


di1 = > di2=d1 —oyvT—t. (10.9) 
oywT —t 


Next we turn to the valuation of the risky debt issued by the firm. Since we assumed 
a constant interest rate r, the price att < T of a default-free zero-coupon bond with 
maturity T and a face value of one equals po(t, T) = e~""7—. According to (10.4) 
we have 

B, = Bpo(t, T) — PPS(t, Vi; r, ov, B, T), (10.10) 
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where PBS (t, V; r, ov, B, T) denotes the Black-Scholes price of a European put 
with strike B, maturity T on (V;) for given interest rate r, and volatility oy. It is 
well known that 


PPS, Vis r, ov, B, T) = Be’ @(—d,,2) — VO (di1), (10.11) 
with d;,ı and d; 2 as in (10.9). Combining (10.10) and (10.11) we get 
B, = polt, T)B® (d; 2) + ViP (—d;,1). (10.12) 


Lines showing the evolution of B; as a function of the evolution of V, under the 
assumption that r = 0.02 have been added to Figure 10.3. The difference between 
the asset value V, and the debt B, is the value of equity S;; note how the value of 
equity is essentially negligible for t > 0.8 in the default path. 


Volatility of the firm’s equity. It is interesting to compute the volatility of the equity 
of the firm under Assumption 10.3. To this end we define the quantity 


v CBS, Vi) 


v(t, V;) = “CES, Vp ` 


In the context of option pricing this is known as the elasticity of a European call 
with respect to the price of the underlying security. In our context it measures the 
percentage change in the value of equity per percentage change in the value of the 
underlying assets. 

If we apply Itô’s formula to S, = cBS (t, Vi; r, ov, B, T) we obtain 


dS, = oy C(t, Vi) V: dW: + (CPS, Vi) + mv CP (t, ViVi + 507 V; Cpy) dt. 
Using the definition of the elasticity v, we may rewrite the dW; term in the form 
ovCy(t, ViVi dW: = ov v(t, Vi)CP*(t, Vi) dW, 


from which we conclude that the volatility of the firm’s equity at time t is a function 
os(t, V;) of time and of the current asset value V, that takes the form 


os(t, Vi) = v(t, Vov. (10.13) 


The volatility of the firm’s equity is therefore greater than oy, since the elasticity of 
a European call is always greater than one. 


Risk-neutral and physical default probabilities. Next we compare physical and 
risk-neutral default probabilities in Merton’s model. It is a basic result from financial 
mathematics that under the risk-neutral measure Q the process (V;) satisfies the 
stochastic differential equation (SDE) dV; = rV; dt + oy V; dW, for a standard 
Q-Brownian motion W. Note how the drift uy in (10.5) has been replaced by the 
risk-free interest rate r. The risk-neutral default probability is therefore given by the 
formula (10.6), evaluated with wy =r: 


InB—InVo—(r— vr) 


q = Q(Vr <S B) ( oy JF 
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Comparing this with the physical default probability p = P(Vr < B) as given in 
(10.6) we obtain the relationship 


ee o(o-! + ety), (10.14) 
V 


The correction term (uy — r)/oy equals the Sharpe ratio of V (a popular measure 
of the risk premium earned by the firm). The transition formula (10.14) is sometimes 
applied in practice to go from physical to risk-neutral default probabilities. Note, 
however, that (10.14) is supported by theoretical arguments only in the narrow 
context of the Merton model. 


Credit spread. We may use (10.12) to infer the credit spread c(t, T) implied by 
Merton’s model. The credit spread measures the difference between the continuously 
compounded yield to maturity of a defaultable zero-coupon bond pı (t, T) and that 
of a default-free zero-coupon bond po(t, T). It is defined by 

-1 -1 , pid, T) 
l t,T)-1 T,t))= ; 
Fa ln pitt, T) — In po) = = In 
Throughout the book we use the convention that a zero-coupon bond has a nominal 
value equal to 1. In line with this convention we assume that the pay-off at T of a 
zero-coupon bond issued by the firm is given by (1/B) Br, so that the price of such 
a bond at time t < T is given by pı(t, T) = (1/B) B;. We therefore obtain 


c(t, T) = 


(10.15) 


-1 V, 
nar ge (CC oe 


t 
7 ET] (10.16) 


Since d;,ı can be rewritten as 


qg = Dn Bpolt. T/V) + 30T -0 
a Ov T -t k 


and similarly for d;,2, we conclude that, for a fixed time to maturity T — t, the spread 
c(t, T) depends only on the volatility oy and on the ratio d := Bpo(t, T)/V;, which 
is the ratio of the discounted nominal value of the firm’s debt to the value of the 
firm’s assets and is hence a measure of the relative debt level or leverage of the 
firm. As the price of a European put (10.11) is increasing in the volatility, it follows 
from (10.10) that c(t, T) is increasing in ov. In Figure 10.4 we plot the credit spread 
as a function of oy and of the time to maturity t = T — t. 


Extensions. Merton’s model is quite simplistic. Over the years this has given rise 
to arich literature on firm-value models. We briefly comment on the most important 
research directions (bibliographic references are given in Notes and Comments). 
To begin with, the observation that, in reality, firms can default at essentially any 
time (and not only at a deterministic point in time T) has led to the development 
of so-called first-passage-time models. In this class of models default occurs when 
the asset-value process crosses a default threshold B for the first time; the threshold 
is usually interpreted as the average value of the liabilities. Formally, the default 
time t is defined by t = inf{t > 0: V, < B}. Further technical developments 
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Figure 10.4. Credit spread c(t, T) in per cent as a function of (a) the firm’s volatility oy and 
(b) the time to maturity t = T — t for fixed leverage measure d = 0.6 (in (a) t = 2 years; in 
(b) oy = 0.25). Note that, for a time to maturity smaller than approximately three months, 
the credit spread implied by Merton’s model is basically equal to zero. This is not in line with 
most empirical studies of corporate bond spreads and has given rise to a number of extensions 
of Merton’s model that are listed in Notes and Comments. We will see in Section 10.5.3 that 
reduced-form models lead to a more reasonable behaviour of short-term credit spreads. 


include models with stochastic default-free interest rates and models where the 
asset-value process (V;) is given by a diffusion with jumps. 

Firm-value models with an endogenous default threshold are an interesting eco- 
nomic extension of Merton’s model. Here the default boundary B is not fixed a priori 
by the modeller but is determined endogenously by strategic considerations of the 
shareholders. Finally, structural models with incomplete information on asset value 
and/or liabilities provide an important link between the structural and reduced-form 
approaches to credit risk modelling. 


10.3.3 Structural Models in Practice: EDF and DD 


There are a number of industry models that descend from the Merton model. An 
important example is the so-called public-firm EDF model that is maintained by 
Moody’s Analytics. The acronym EDF stands for expected default frequency; this is 
a specific estimate of the physical default probability of a given firm over a one-year 
horizon. The methodology proposed by Moody’s Analytics builds on earlier work 
by KMV (a private company named after its founders Kealhofer, McQuown and 
Vasicek) in the 1990s, and is also known as the KMV model. Our presentation of 
the public-firm EDF model is based on Crosbie and Bohn (2002) and Sun, Munves 
and Hamilton (2012). We concentrate on the main ideas, since detailed information 
about actual implementation and calibration procedures is proprietary and these 
procedures may change over time. 
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Overview. Recall that in the classic Merton model the one-year default proba- 
bility of a given firm is given by the probability that the asset value in one year 
lies below the threshold B representing the overall liabilities of the firm. Under 
Assumption 10.3, the one-year default probability is a function of the current asset 
value Vo, the (annualized) drift uy and volatility oy of the asset-value process, and 
the threshold B; using (10.6) with T = 1 and recalling that & (d) = 1 — ® (—d) we 
infer that 


In Vo — In B + (uy — a ne 


Ov 


EDF Merton = 1 o( 


In the public-firm EDF model a similar structure is assumed for the EDF; however, 
1 — @ is replaced by some empirically estimated function, B is replaced by a new 
default threshold B representing the structure of the firm’s liabilities more closely, 
and the term (uy — 50%) in the numerator is sometimes omitted for expositional 
ease. Moreover, the current asset value Vo and the asset volatility oy are inferred 
(or “backed out”) from information about the firm’s equity value. 


Determination of the asset value and the asset volatility. Firm-value-based credit 
risk models are based on the market value Vo of the firm’s assets. This makes sense 
as the market value is a forward-looking measure that reflects investor expectations 
about the business prospects and future cash flows of the firm. Unfortunately, in con- 
trast to the assumptions underlying Merton’s model, in most cases there is no market 
for the assets of a firm, so that the asset value is not directly observable. Moreover, 
the market value can differ greatly from the value of a company as measured by 
accountancy rules (the so-called book value), so that accounting information and 
balance sheet data are of limited use in inferring the asset value Vo. For these reasons 
the public-firm EDF model relies on an indirect approach and infers values of V, 
at different times ¢ from the more easily observed values of a firm’s equity S+. This 
approach simultaneously provides estimates of Vo and of the asset volatility oy. The 
latter estimate is needed since oy has a strong impact on default probabilities; all 
other things being equal, risky firms with a comparatively high asset volatility oy 
have a higher default probability than firms with a low asset volatility. 

We explain the estimation approach in the context of the Merton model. Recall 
that under Assumption 10.3 we have that 


Si = CBS(t, Vr, ov, B, T). (10.18) 


Obviously, at a fixed point in time, t = 0 say, (10.18) is an equation with two 
unknowns, Vo and øy. To overcome this difficulty, one may use an iterative proce- 


dure. In step (1), (10.18) with some initial estimate on is used to infer a time series 


of asset values Vv) from equity values. Then a new volatility estimate ou is esti- 
mated from this time series; a new time series ve?) is then constructed using (10.18) 
with oe This procedure is iterated n times, until the volatility estimates a" 
and ae generated in step (n — 1) and step (n) are sufficiently close. 

In the public-firm EDF model, the capital structure of the firm is modelled in 


a more sophisticated manner than in Merton’s model. There are several classes 
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of liabilities, such as long- and short-term debt and convertible bonds, the model 
allows for intermediate cash payouts corresponding to coupons and dividends, and 
default can occur at any time. Moreover, the default point (the threshold value B 
such that the company defaults if (V;) falls below B) is determined from a more 
detailed analysis of the term structure of the firm’s debt. The equity value is thus no 
longer given by (10.18) but by some different function f(t, V;, ov), which has to 
be computed numerically. The general idea of the approach used to estimate Vo and 
ov is, however, exactly as described above. 


Calculation of EDFs. In the Merton model, default occurs if the value of a firm’s 
assets falls below the value of its liabilities. With lognormally distributed asset 
values, as implied for instance by Assumption 10.3, this leads to default probabilities 
of the form EDF Merton as in (10.17). This relationship between asset value and 
default probability may be too simplistic to be an accurate description of actual 
default probabilities. For instance, asset values are not necessarily lognormal but 
might follow a distribution with heavy tails and there might be payments due at an 
intermediate point in time causing default at that date. 

For these reasons, in the public-firm EDF model a new state variable is introduced 
in an intermediate step. This is the so-called distance-to-default (DD), given by 


DD := (log Vo — log B)/oy. (10.19) 


Here, B represents the default threshold; in some versions of the model B is modelled 
as the sum of the liabilities payable within one year and half of the longer-term 
debt. Sometimes practitioners call the distance-to-default the “number of standard 
deviations a company is away from its default threshold”. Note that (10.19) is in fact 
an approximation of the argument of (10.17), since uy and oy are usually small. 

In the EDF methodology it is assumed that the distance-to-default ranks firms 
in the sense that firms with a higher DD exhibit a higher default probability. The 
functional relationship between DD and EDF is determined empirically; using a 
database of historical default events, the proportion of firms with DD in a given 
small range that default within a year is estimated. This proportion is the empirically 
estimated EDF. The DD-to-EDF mapping exhibits “heavy tails”: for high-quality 
firms with a large DD the empirically estimated EDF is much higher than EDF Merton 
as given in (10.17). For instance, for a firm with a DD equal to 4 we find that 
EDF Merton % 0.003% , whereas the empirically estimated EDF equals 0.4%. 

In Table 10.3 we illustrate the computation of the EDF for two different firms, 
Johnson & Johnson (a well-capitalized firm that operates in the relatively stable 
health care market) and RadioShack (a firm that is active in the highly volatile 
consumer electronics business). If we compare the numbers, we see that the EDF 
for Johnson & Johnson is close to zero whereas the EDF for RadioShack is quite 
high. This difference reflects the higher leverage of RadioShack and the riskiness 
of the underlying business, as reflected by the comparatively large asset volatility 
oy = 24%. Indeed, on 11 September 2014, the New York Times reported that a 
bankruptcy filing for RadioShack could be near, suggesting that the EDF had good 
predictive power in this case. 


10.3. Structural Models of Default 389 


Table 10.3. A summary of the public-firm EDF methodology. The example is taken from 
Sun, Munves and Hamilton (2012); it is concerned with the situation of Johnson & Johnson 
(J&J) and RadioShack as of April 2012. All quantities are in US dollars. 


Variable J&J RadioShack Notes 
Market value of assets Vọ 236 bn 1834 m ) Determined from time series of 
Asset volatility oy 11% 24% equity prices 
Default threshold B 39 bn 1042 m Short-term liabilities and half of 
long-term liabilities 
DD 16.4 2.3 Given by (log Vo — log B)/oy 
EDF (one year) 0.01% 3.58% Determined using empirical 


mapping between DD and EDF 


10.3.4 Credit-Migration Models Revisited 


Recall that in the credit-migration approach each firm is assigned to a credit-rating 
category at any given time point. There are a finite number of such ratings and they 
are ordered by credit quality and include the category of default. The probability of 
moving from one credit rating to another credit rating over the given risk horizon 
(typically one year) is then specified. In this section we explain how a migration 
model can be embedded in a firm-value model and thus be treated as a structural 
model. This will be useful in the discussion of portfolio versions of these models in 
Chapter 11. Moreover, we compare the public-firm EDF model and credit-migration 
approaches. 


Credit-migration models as firm-value models. We consider a firm that has been 
assigned to some non-default rating category j att = O and for which transition 
probabilities pj, 0 < k < n, over the period [0, T] are available on the basis of that 
rating. These express the probability that the firm belongs to rating class k at the 
time horizon T, given that it is in class j at t = 0. In particular, p; o is the default 
probability of the firm over [0, T]. 
Suppose that the asset-value process (V;) of the firm follows the model given 
in (10.5), so that 
Vr = Voexp((uy — 407)T + oy Wr) (10.20) 


is lognormally distributed. We can now choose thresholds 
0=d <d <- < dn <dy41 = 00 (10.21) 


such that P (d; < Vr <S dg41) = pjk for k € {0, ..., n}. We have therefore trans- 
lated the transition probabilities into a series of thresholds for an assumed asset-value 
process. The threshold d 1 is the default threshold, which in the Merton model of Sec- 
tion 10.3.1 was interpreted as the value of the firm’s liabilities. The higher thresholds 
are the asset-value levels that mark the boundaries of higher rating categories. The 
firm-value model in which we have embedded the migration model can be summa- 
rized by saying that the firm belongs to rating class k at the time horizon T if and 
only if dy < Vr <S dt. 
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The migration probabilities in the firm-value model obviously remain invariant 
under simultaneous strictly increasing transformations of Vr and the thresholds dj. 
If we define 

In Vr — In Vo — (uy — 307)T 
Meas eee pti gene (10.22) 
ovyVT 
In dy — In Vo — Seer 
Fea se Daa Ree A (10.23) 
oyVT 
then we can also say that the firm belongs to rating class k at the time horizon T if 
and only if dy < Xr < dg+1. Observe that X7 is a standardized version of the asset- 
value log-return ln Vr — ln Vo, and we can easily verify that Xr = Wr/ VT so that 
it has a standard normal distribution. In this case the formulas for the thresholds are 


easily obtained and are dg = Sl, pj) fork =1,...,n. 


The public-firm EDF model and credit-migration approaches compared. The 
public-firm EDF model uses market data, most notably the current stock price, 
as inputs for the EDF computation. The EDF therefore reacts quickly to changes in 
the economic prospects of a firm, as these are reflected in the firm’s share price and 
hence in the estimated distance-to-default. Moreover, EDFs are quite sensitive to the 
current macroeconomic environment. The distance-to-default is observed to rise in 
periods of economic expansion (essentially due to higher share prices reflecting bet- 
ter economic conditions) and to decrease in recession periods. Rating agencies, on 
the other hand, are typically slow in adjusting their credit ratings, so that the current 
rating does not always reflect the economic condition of a firm. This is particularly 
important if the credit quality of a firm deteriorates rapidly, as is typically the case 
with companies that are close to default. For instance, the investment bank Lehman 
Brothers had a fairly good rating (Aa or better) when it defaulted in September 2008. 
EDFs might therefore be better predictors of default probabilities over short time 
horizons. 

On the other hand, the public-firm EDF model is quite sensitive to global over- 
and underreaction of equity markets. In particular, the bursting of a stock market 
bubble may lead to drastically increased EDFs, even if the economic outlook for a 
given corporation has not changed very much. This can lead to huge fluctuations 
in the amount of risk capital that is required to back a given credit portfolio. From 
this point of view the relative inertia of ratings-based models could be considered 
an advantage, as the ensuing risk capital requirements tend to be more stable over 
time. 


Notes and Comments 


There are many excellent texts, at varying technical levels, in which the basic math- 
ematical finance results used in Section 10.3.2 can be found. Models in discrete time 
are discussed in Cox and Rubinstein (1985) and Jarrow and Turnbull (1999); excel- 
lent introductions to continuous-time models include Baxter and Rennie (1996), 
Bjork (2004), Bingham and Kiesel (2004) and Shreve (2004b). 
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Lando (2004) gives a good overview of the rich literature on firm-value mod- 
els. First-passage-time models have been considered by, among others, Black and 
Cox (1976) and, in a set-up with stochastic interest rates, Longstaff and Schwartz 
(1995). The problem of the unrealistically low credit spreads for small maturities 
t = T — t, which we pointed out in Figure 10.4, has also led to extensions of Mer- 
ton’s model. Partial remedies within the class of firm-value models include models 
with jumps in the firm value, as in Zhou (2001), time-varying default thresholds, 
as in Hull and White (2001), stochastic volatility models for the firm-value process 
with time-dependent dynamics, as in Overbeck and Schmidt (2005), and incomplete 
information on firm value or default threshold, as in Duffie and Lando (2001), Frey 
and Schmidt (2009) and Cetin (2012). Models with endogenous default thresholds 
have been considered by, among others, Leland (1994), Leland and Toft (1996) and 
Hilberink and Rogers (2002). 

Duffie and Lando (2001) established a relationship between firm-value models 
and reduced-form models in continuous time. Essentially, they showed that, from 
the perspective of investors with incomplete accounting information (i.e. incomplete 
information about the assets or liabilities of a firm), a firm-value model becomes 
a reduced-form model. A less technical discussion of these issues can be found in 
Jarrow and Protter (2004). 

The public-firm EDF model was first described in Crosbie and Bohn (2002); the 
model variant that is currently in use is described in Dwyer and Qu (2007) and Sun, 
Munves and Hamilton (2012). 


10.4 Bond and CDS Pricing in Hazard Rate Models 


Hazard rate models are the most basic reduced-form credit risk models and are 
therefore a natural starting point for our discussion of this model class. Moreover, 
hazard rate models are used as an input in the construction of the popular copula 
models for portfolio credit derivatives. For these reasons this section is devoted to 
bond and CDS pricing in hazard rate models. We begin by introducing the necessary 
mathematical background in Section 10.4.1. Since the pricing results that we present 
in this section rely on the concept of risk-neutral pricing and martingale modelling, 
we briefly review these notions in Section 10.4.2. The pricing of bonds and CDSs 
and some of the related empirical evidence is discussed in Sections 10.4.3, 10.4.4 
and 10.4.5. 


10.4.1 Hazard Rate Models 


A hazard rate model is a model in which the distribution of the default time of an 
obligor is directly specified by a hazard function without modelling the mechanism 
by which default occurs. 

To set up a hazard rate model we consider a probability space (2, F, P) anda 
random time t defined on this space, i.e. an F -measurable rv taking values in [0, oo]. 
In economic terms, t can be interpreted as the default time of some company. 
We denote the df of t by F(t) = P(t < t) and the tail or survival function by 
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F(t) = 1 — F(t); we assume that P(t = 0) = F(0) = 0, and that F(t) > 0 for all 
t < oo. We define the jump or default indicator process (Y;) associated with t by 


Y, = luc t20. (10.24) 


Note that (Y;) is a right-continuous process that jumps from 0 to | at the default 
time t and that 1 — Y, = ir>} is the survival indicator of the firm. 


Definition 10.4 (cumulative hazard function and hazard function). The function 
Q@Q):=- In(F (t)) is called the cumulative hazard function of the random time T. If 
F is absolutely continuous with density f, the function y(t) := f(t)/U — F(t)) = 
f@/ F(t) is called the hazard function of T. 


By definition we have F(t) = e7" ®. If F has density f, we calculate that 
I(t) = f(t)/ F (t) = y(t), so that we can represent the survival function of t in 
terms of the hazard function by 


t 
F(t) = exp (- / y(s)4s). (10.25) 
0 


The hazard function y (t) at a fixed time t gives the hazard rate at t, which can be 
interpreted as a measure of the instantaneous risk of default at t, given survival up to 
time ż. In fact, for h > Owehave P(t <t+h|t>t)=(F(t+h)—- F(t))/F(t). 
Hence we obtain 

F(t +h) — F(t) 
F(t) h>0 h E 


1 
lim -P(T <t+h t) = t). 
lim -P@<t+h|t>t) y(t) 
For illustrative purposes we determine the hazard function for the Weibull distri- 
bution. This is a popular distribution for survival times with df F(t) = 1 — e*"* 
for parameters à, œ > 0. For œ = 1 the Weibull distribution reduces to the standard 
exponential distribution. Differentiation yields 


f(t) =Aat™!e*" and y(t) = Aat®!, 


In particular, y is decreasing in t if a < 1 and increasing if a > 1. Fora = 
1 (exponential distribution) the hazard rate is time independent and equal to the 
parameter i. 


Filtrations and conditional expectations. In financial models, filtrations are used 
to model the information available to investors at various points in time. Formally, 
a filtration (F;) on (2, F) is an increasing family {F;: t > 0} of sub-o -algebras 
of F: F; C Fs C F for0O <t < 5 < oo. The o-algebra F, represents the state of 
knowledge of an observer at time t, and A € F; is taken to mean that at time ¢ the 
observer is able to determine if the event A has occurred. 

In this section it is assumed that the only quantity that is observable for investors is 
the default event of the firm under consideration or, equivalently, the default indicator 
process (Y;) associated with t. The appropriate filtration is therefore given by (Jt;) 
with 

Hı =o ({Yy: u < t}), (10.26) 
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the default history up to and including time t. By definition, t is an (J€;) stopping 
time, as {t < t} = {Y; = 1} € H, for all t > 0; moreover, (F¢;) is obviously the 
smallest filtration with this property. 

In order to study bond and CDS pricing in hazard rate models we need to compute 
conditional expectations with respect to the o -algebra #f;. We begin our analysis of 
this issue with an auxiliary result on the structure of #f;-measurable rvs. The result 
formalizes the fact that every #;-measurable rv can be expressed as a function of 
events related to the default history at t. 


Lemma 10.5. Every #;-measurable rv H is of the form H = h(t) Icy + cl{rs1 
for a measurable function h: [0, t] — R and some constant c € R. 


Proof. The o-algebra H, is generated by the events {Y, = 1} = {t < u}, u < t, 
and {Y, = 0} = {t > t}, and hence by the rvs (t A t) := min{r, t} and J;;5,). This 
implies that any #f;-measurable rv H can be written as H = g(t ^ t, Ir>t}) for 
some measurable function g: [0, t] x {0, 1} —> R. The claim follows if we define 
h(u) := g(u, 0), u <S t, and c := g(t, 1). 


Lemma 10.6. Lett be a random time with jump indicator process Y; = lyr and 
natural filtration (#,). Then, for any integrable rv X and any t > 0, we have 


E(lr>}X) 


E(lir>1}X | Hi) = Ttrst} P(t = 1) G 


(10.27) 
Proof. Since E(lr>1}X | H+) is #;-measurable and zero on {t < t}, we obtain 
from Lemma 10.5 that E(lr>1X | Ht) = Ics for some constant c. Taking 
expectations yields E (I{;3,;X) = cP (t > t)andhencec = E (Iir>}X)/P (T > t). 


Lemma 10.6 can be used to determine conditional survival probabilities. Fixt < T 

and consider the quantity P(t > T | H). Applying (10.27) with X := Ir>r} yields 
F(T) 

P(t > T | H) = EX | H) = EUn X | H) = leong 028) 


If t admits the hazard function y (t), we get the important formula 


T 
P(t >T | Hi) = hr> exp (-f y (s) as), t<T. (10.29) 
t 


The next proposition is concerned with stochastic process properties of the jump 
indicator process of a random time Tt. 


Proposition 10.7. Lett be a random time with absolutely continuous df F and haz- 
ard function y. Then M; := Y; — h Ir>u,y (u) du, t > 0, is an (#,)-martingale: 
that is, E(M, | Hi) = M, for all0 < t X s < œ. 


In Section 10.5.1 we extend this result to doubly stochastic random times and 
discuss its financial and mathematical relevance. 
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Proof. Lets > t. We have to show that E(M, — M, | #,) = 0, i.e. that E (Y, — Y; | 
H) = EG? y (U)L ty <r; du | F,). Using (10.28) we get 


F(s) 
E(Y; — Y, | Hi) = Ir> P(t <s|H)= lr>t} pe 
F(t) 
-r EO- EO 
= L{r>12} F(t) . 
Note that X := Ly Wwe} du is 0 on {t < t}, so X = XIj;34). Hence we 
obtain from Lemma 10.6, the Fubini Theorem and the identity F’ t) =-f@t) = 
—y (t)F (t) that 


E(X) SP v) F(u) du F(t) — F(s) 
S A Ttc>t —=<—<—= E lrs} — = 
F(t) F(t) F(t) 


’ 


E(X | H) = Titan 


and the result follows. 


10.4.2 Risk-Neutral Pricing Revisited 


The remainder of Section 10.4 is devoted to an analysis of risk-neutral pricing results 
for credit products in hazard rate models. Risk-neutral pricing has become so popular 
that the conceptual underpinnings are often overlooked. A prime case in point is the 
mechanical use of the Gauss copula to price CDO tranches, a practice that led to 
well-documented problems during the 2007-9 financial crisis. It therefore seems 
appropriate to clarify the applicability and the limitations of risk-neutral pricing in 
the context of credit risk models. 


Risk-neutral pricing. We build on the elementary discussion of risk-neutral valua- 
tion given in Section 2.2.2. In that section we considered a simple one-period default 
model for a defaultable zero-coupon bond with maturity T equal to one year and 
a deterministic recovery rate 1 — 5 equal to 60%. Moreover, we assumed that the 
real-world default probability was p = 1%, the risk-free simple interest rate was 
0,1 = 5%, and the market price of the bond at t = 0 was p1(0, 1) = 0.941. 

Risk-neutral pricing is intimately linked to the notion of a risk-neutral measure. In 
general terms arisk-neutral measure is an artificial probability measure Q, equivalent 
to the historical measure P, such that the discounted prices of all traded securities 
are Q-martingales (fair bets). We have seen in Section 2.2.2 that in the simple one- 
period default model a risk-neutral measure Q is simply given by an artificial default 
probability g such that 


pi (0, 1) = 1.057! (( — q) - 1+ q- 0.6). 


Obviously, q is uniquely determined by this equation and is given by q = 0.03. 
Note that in this example the risk-neutral default probability q is higher than the 
real-world default probability p. This reflects risk aversion on the part of investors 
and is typical for real markets; empirical evidence on the relationship between the 
physical and historical default probabilities will be presented in Section 10.4.5. 
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The risk-neutral pricing rule states that the price of a derivative security can be 
computed as the mathematical expectation of the discounted pay-off under a risk- 
neutral measure Q. In mathematical terms the price at time ¢ of a derivative with 
pay-off H and maturity T > t is thus given by 


T 
Vi = £2(exp (-/ Is as) | s) (10.30) 
t 


where r; denotes the continuously compounded default-free short rate of interest at 
time s and where the o-algebra F; represents the information available to investors 
at time ¢ (see the discussion of filtrations in Section 10.4.1). Note that in one-period 
models, (10.30) reduces to the simpler expression vii =E? (H/(1 + ro,1)), where 
ro,1 is the simple interest rate for the period. 

There are two theoretical justifications for risk-neutral pricing. One argument is 
based on absence of arbitrage: according to the first fundamental theorem of asset 
pricing, a model for security prices is arbitrage free if and only if it admits at least 
one equivalent martingale measure Q. Hence, if a financial product is to be priced in 
accordance with no-arbitrage principles, its price must be given by the risk-neutral 
pricing formula for some risk-neutral measure Q. A second justification relies on 
hedging: in financial models it is often possible to replicate the pay-off of a financial 
product by (dynamic) trading in the available assets, and in a frictionless market the 
cost of carrying out such a hedge is given by the risk-neutral pricing rule. 


Hedging and market completeness. Next we take a closer look at the concept of 
hedging. We work in the simple one-period default model that was introduced in 
the previous paragraph. Consider an investor, e.g. an investment bank, who plans to 
sell derivatives on the defaultable zero-coupon bond. For concreteness we consider 
a default put option with maturity date T = 1. This contract pays one unit if the 
bond defaults and zero otherwise; it can be thought of as a simplified version of a 
CDS. Obviously, the pay-off of the default put is unknown at date tf = 0 and thus 
constitutes a risk for the investor. A possible strategy for dealing with this risk is to 
form a hedging portfolio in the defaultable bond and in cash that reduces the risk 
of selling the put: suppose that at time t = 0 we go short 2.5 units of the bond and 
hold 30 7 2.38 units of cash. At time t = 1 there are two possibilities for the value 
V, of this portfolio. 


e Default occurs: in which case Vj = (—2.5) - 0.6 + 3° -1.05 = 1. 
e No default: in which case Vj = (—2.5)- 1+ 30 - 1.05 =0. 


In either case the value V, of the hedge portfolio equals the pay-off of the option 
and we have found a so-called replicating strategy for the option. In particular, by 
forming the replicating strategy the investor completely eliminates the risk from 
selling the option, and the law of one price dictates that the fair price at t = 0 
of the option should equal the value of the hedge portfolio at t = O given by 
Vo = (—2.5) - 0.941 + a 7X 0.0285 (otherwise either the buyer or the seller could 
make some risk-free profit). 
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To construct the portfolio in this simple one-period, two-state setting we have to 
consider two linear equations. Denote by £; and éz the units of the defaultable bond 
and the amount of cash in our portfolio. At time t = 1 we must have 


é -0.6 + é2 - 1.05 = 1 (the default case), (10.31) 
&-10+8-1.05=0 (the no-default case), (10.32) 
which leads to the above values of €; = —2.5 and é2 = 30 . In mathematical finance a 


derivative security is called attainable if there is a replicating portfolio strategy in the 
underlying assets. The above argument shows that in the simple one-period default 
model with only two states every derivative security is attainable. Such models are 
termed complete. 

The fair price of the default put (the initial value Vo of the replicating portfolio) 
can alternatively be computed by the risk-neutral pricing rule. Recall that the risk- 
neutral default probability is given by g = 0.03. The risk-neutral pricing rule applied 
to the default put thus leads to a value of (1.05)~!(0.97 - 0 + 0.03 - 1) = 0.0285, 
which is equal to Vo. This is, of course, not a lucky coincidence; a basic result 
from mathematical finance states that the fair price of any attainable claim can 
be computed as the expected value of the discounted pay-off under a risk-neutral 
measure. Armed with this result, we typically first compute the price (the expected 
value of the discounted pay-off under a risk-neutral measure) and then determine 
the replicating strategy. For this reason a lot of research focuses on the problem of 
computing prices. However, one should bear in mind that the economic justification 
for the risk-neutral pricing rule stems partially from the hedging argument, which 
applies only to attainable claims. This issue has, to a large extent, been neglected in 
the literature on the pricing of credit-risky securities. The next example illustrates 
some of the difficulties arising in incomplete markets, where most derivatives are 
not attainable. 


Example 10.8 (a model with random recovery). As there is a substantial amount 
of randomness in real recovery rates, it is interesting to study the impact of random 
recovery rates on the validity of the above pricing arguments. We consider an exten- 
sion of the basic one-period default model in which the loss given default may be 
either 30% or 50%. The price is assumed to be p1(0, 1) = 0.941 and the risk-free 
simple interest rate is assumed to be ro; = 5% as before. The evolution of the price 
pi, 1) isillustrated in Figure 10.5. We leave the physical measure unspecified—we 
assume only that all three possible outcomes have strictly positive probability. 

We begin our analysis of this model by determining the equivalent martingale 
measures. Let qı be the risk-neutral probability that default occurs and the LGD 
is 0.5, let q2 be the risk-neutral probability that default occurs and the LGD is 0.3, 
and let q3 = 1 — qı — q2. It follows that qı and q2 satisfy the equation 


pi, 1) = 1.057! (q1 -0.5+q2-0.7+ (1 — q1 —q)-D), (10.33) 


with the restrictions that gj > 0, q2 > 0, 1 — gi — q2 > 0. Obviously, Q is no 
longer unique. It is easily seen from (10.33) that the set @ of equivalent martingale 
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„9 1.0 (no default) 
0.941 @ © 0.7 (default, 5 = 30%) 


TO 0.5 (default, 5 = 50%) 
Figure 10.5. Evolution of the price pı (-, 1) of the defaultable bond in Example 10.8. 
measures is given by 
Q= E R°: qı € (0, 0.024), q2 = Wa — 1.05 - pı (0, 1) — 0.5 - q1), 


qg = 1- (qı + q2)}. 

(10.34) 
It is interesting to look at the boundary cases. For qı = O we obtain q2 = 4%, 
q3 = 96%; this is the scenario where the risk-neutral default probability q = q1 + q2 
is maximized. For qı = 2.4% we obtain q2 = 0, q3 = 97.6%; this is the scenario 
where q is minimized. Note, however, that the measures go := (0.024, 0, 0.976) and 
qi := (0, 0.04, 0.96) do not belong to Q, as they are not equivalent to the physical 
measure P. 

Consider a derivative security with pay-off H and maturity T = 1, such as the 
default put with pay-off H = 0 if pı(1, 1) = 1 (no default) and H = 1 otherwise. 
Every price of the form Hy = E2(1.05~'H) for some Q e€ Q is consistent with 
no arbitrage and will therefore be called an admissible value for the derivative. If 
Q contains more than one element, as in our case, there is typically more than one 
admissible value. For instance, we obtain for the default put option that 


int £2( 2 of H 
inf Ex | — | ~ 0.023 and sup E~|( — } ~ 0.038; (10.35) 
QeQ 1.05 QcQ 1.05 

obviously, the infimum and supremum in (10.35) correspond to the measures qo 
and q1, where q is minimized and maximized, respectively. This non-uniqueness of 
admissible values reflects the fact that in our three-state model the put is no longer 
attainable. In fact, the hedging portfolio (€1, 2) now has to solve the following three 
equations: 


& -0.5 +2- 1.05 = 1 (default, low recovery), 
é& -0.7 +2- 1.05 = 1 (default, high recovery), (10.36) 
&-1+8-105=0 (no default). 


It is immediately seen that the system (10.36) of three equations and only two 
unknowns has no solution, so that the default put is not attainable. This illustrates two 
fundamental results from modern mathematical finance: a claim with bounded pay- 
off is attainable if and only if the set of admissible values consists of a single number; 
an arbitrage-free market is complete if and only if there is exactly one equivalent 
martingale measure Q. The latter result is known as the second fundamental theorem 
of asset pricing. 
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Example 10.8 shows that in an incomplete market new issues arise; in particular, it 
is not obvious how to choose the correct price of a derivative security from the range 
of admissible values or how to deal with the risk incurred by selling a derivative 
security. This is unfortunate, as realistic models, which capture the dynamics of 
financial time series, are typically incomplete. In recent years a number of interesting 
concepts for the risk management of derivative securities in incomplete markets 
have been developed. These approaches typically propose mitigating the risk by an 
appropriate trading strategy and often suggest a pricing formula for the remaining 
risk. However, a discussion of this work is outside the scope of this book. A brief 
overview of the existing literature on hedging in (incomplete) credit markets is given 
in Notes and Comments. 


Advantages and limitations of risk-neutral pricing. The risk-neutral pricing 
approach is a relative pricing theory, which explains prices of credit products 
in terms of observable prices of other securities. If properly applied, it leads to 
arbitrage-free prices of credit-risky securities, which are consistent with prices 
quoted in the market. These features make the risk-neutral pricing approach to credit 
risk the method of choice in an environment where credit risk is actively traded and, 
in particular, for valuing credit instruments when the market for related products is 
relatively liquid. On the other hand, since pricing models have to be calibrated to 
prices of traded credit instruments, they are difficult to apply when we lack sufficient 
market information. Moreover, in such cases prices quoted using an ad hoc choice 
of some risk-neutral measure are more or less plucked out of thin air. 

This can be contrasted with the more traditional pricing methodology for loans 
and related credit products, where a loan is taken on the balance sheet if the spread 
earned on the loan is deemed by the lender to be a sufficient compensation for 
bearing the default risk of the loan and where the default risk is measured using the 
real-world measure and historical (default) data. Such an approach is well suited to 
situations where the market for related credit instruments is relatively illiquid and 
little or no price information is available; loans to medium or small businesses are 
a prime example. On the other hand, the traditional pricing methodology does not 
necessarily lead to prices that are consistent (in the sense of absence of arbitrage) 
across products or compatible with quoted market prices for credit instruments, so 
it is less suitable in a trading environment. 


Martingale modelling. Recall that, according to the first fundamental theorem of 
asset pricing, a model for security prices is arbitrage free if and (essentially) only 
if it admits at least one equivalent martingale measure Q. Moreover, in a complete 
market, the only thing that matters for the pricing of derivative securities is the 
Q-dynamics of the traded underlying assets. When building a model for pricing 
derivatives it is therefore a natural shortcut to model the objects of interest—such as 
interest rates, default times and the price processes of traded bonds—directly, under 
some exogenously specified martingale measure Q. In the literature this approach 
is termed martingale modelling. 
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Martingale modelling is particularly convenient if the value H of the underlying 
assets at some maturity date T is exogenously given, as in the case of zero-coupon 
bonds. In that case the price of the underlying asset at time t < T can be computed 
as the conditional expectation under Q of the discounted value at maturity via the 
risk-neutral pricing rule (10.30). Model parameters are then determined using the 
requirement that at time t = 0 the price of traded securities, as computed from the 
model using (10.30), should coincide with the price of those securities as observed 
in the market; this is known as calibration of the model to market data. 

Martingale modelling ensures that the resulting model is arbitrage free, which is 
advantageous if one has to model the prices of many different securities simultane- 
ously. The approach is therefore frequently adopted in default-free term structure 
models and in reduced-form models for credit-risky securities. Martingale mod- 
elling has two drawbacks. First, historical information is, to a large extent, useless 
in estimating model parameters, as these may change in the transition from the 
real-world measure to the equivalent martingale measure. Second, as illustrated in 
Example 10.8, realistic models for pricing credit derivatives are typically incomplete, 
so that one cannot eliminate all risk by dynamic hedging. In those situations one is 
interested in the distribution of the remaining risk under the physical measure P, so 
martingale modelling alone is not sufficient. In summary, the martingale-modelling 
approach is most suitable in situations where the market for underlying securities 
is relatively liquid. In that case we have sufficient price information to calibrate our 
models, and issues of market completeness become less relevant. 


10.4.3 Bond Pricing 


In this section we discuss the pricing of defaultable zero-coupon bonds in hazard rate 
models. Note that coupon-paying corporate bonds can be represented as a portfolio 
of zero-coupon bonds, so our analysis applies to coupon-paying bonds as well. 


Recovery models. We begin with a survey of different models for the recovery of 
defaultable zero-coupon bonds. As in previous sections we denote the price at time t 
of a defaultable zero-coupon bond with maturity T > t by pı (t, T); po(t, T) denotes 
the price of the corresponding default-free zero-coupon bond. The face value of these 
bonds is always taken to be one. The following recovery models are frequently used 
in the literature. 


(i) Recovery of Treasury (RT). The RT model was proposed by Jarrow and Turn- 
bull (1995). Under RT, if default occurs at some point in time t < T, the 
owner of the defaulted bond receives (1 — 6,) units of the default-free zero- 
coupon bond po(-, T) at time t, where 6, € [0, 1] models the percentage 
loss given default. At maturity T the holder of the defaultable bond therefore 
receives the payment Ij;~7} + (1 — 67) Ir<7}. 


(ii) Recovery of Face Value (RF). Under RF, if default occurs at t < T, the holder 
of the bond receives a (possibly random) recovery payment of size (1 — ô+) 
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immediately at the default time t. Note that even with deterministic loss given 
default 6, = ô and deterministic interest rates, the value at maturity of the 
recovery payment is random as it depends on the exact timing of default. 


A further recovery model, the so-called recovery of market value, is considered in 
Section 10.5.3. In real markets, recovery is a complex issue with many legal and 
institutional features, and all recovery models put forward in the literature are at 
best a crude approximation of reality. The RF assumption comes closest to legal 
practice, as debt with the same seniority is assigned the same (fractional) recovery, 
independent of the maturity. On the other hand, for “extreme” parameter values 
(long maturities and high risk-free interest rates), RF may lead to negative credit 
spreads, as we will see in Section 10.6.3. Moreover, the RF model leads to slightly 
more involved pricing formulas for defaultable bonds than the RT model. Empirical 
evidence on recovery rates for loans and bonds is discussed in Section 11.2.3. 


Bond pricing. Next we turn to pricing formulas for defaultable bonds. We use 
martingale modelling and work directly under some martingale measure Q. We 
assume that under Q the default time t is a random time with deterministic risk- 
neutral hazard function y2(t). The information available to investors at time t is 
given by H, = o({Y,: u < t}). We take interest rates and recovery rates to be 
deterministic; the percentage loss given default is denoted by 6 € (0, 1), and the 
continuously compounded interest rate is denoted by r(t) > 0. Note that, in this 
setting, the price of the default-free zero-coupon bond with maturity T > t equals 


T 
po(t, T) = exp (-/ roas). 
t 


This is the simplest type of model that can be calibrated to a given term structure of 
default-free interest rates and single-name credit spreads; generalizations allowing 
for stochastic interest rates, recovery rates and hazard rates will be discussed in 
Section 10.5. 

The actual payments of a defaultable zero-coupon bond can be represented as 
a combination of a survival claim that pays one unit at the maturity date T and a 
recovery payment in case of default. The survival claim has pay-off [j;>7}. Recall 
from (10.29) that 


T 
OCT >T | Hi) = Iir») exp (-f y2(s) as) 
t 
and define R(t) = r(t) +y 2(t). The price of a survival claim at time ¢ then equals 


T 
E (pott, T)lir>r} | Hr) = exp (-f r(s) as)oc >T | H) 
t 


T 
= lyr») exp (-f RO) 4), (10.37) 
t 


Note that for t > t, (10.37) can be viewed as the price of a default-free zero- 
coupon bond with adjusted interest rate R(t) > r(t). A similar relationship between 
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defaultable and default-free bond prices can be established in many reduced-form 
credit risk models. 

Under the RT model the value of the recovery payment at the maturity date T of 
the bond is (1 — 6) Itr;<r} = (1 — 6) —  — ô)Iir>r}. Using (10.37), the value of 
the recovery payment at time t < T is therefore 


T 
(1 — ô) polt, T) — (L — 8) Mr>1 exp (-f (r(s) + y2(s)) as). 
t 


Under the RF hypothesis the recovery payment takes the form (1 — ô) /,;<7}, where 
the payment occurs directly at time t. Payments of this form will be referred to 
as payment-at-default claims). The value of the recovery payment at time t < T 
therefore equals 

x) 


£e(a — ô) Iire <T} exp (- f r(s) as) 
t 


The evaluation of this expression is discussed in the following lemma. 


Lemma 10.9. Suppose that t is a random time with hazard function y Ê (t), and let 
R(t) =r(t)+ y2(t) as before. Then we have the identity 


£°(Iperen) exp (- f r(s) as) | z) 
i T s 
= toon f y? (s) exp (-/ Rw du) ds. 
t t 


Proof. Using Lemma 10.6 we get that 


T 
£2( there) exp (-f r(s) as) | s) 
t 


EL (Iu<r<r} exp(— f; r(s) ds)) 
exp(— fọ y2 (s) ds) l 


t 
y 2(t) exp (-/ 209) ds), 
we have 


£2( tarr exp (- f r(s) as) 
t 
T S S 
al exp(- f rw) au) 20) exp (= | yu) au) as, 
t t 0 


Substitution of the right-hand side into equation (10.38) gives the result. 


= Itsy (10.38) 


Since t has density 


10.4.4 CDS Pricing 


The CDS market is among the most liquid markets for credit-risky securities, so 
the task of building a model using CDS spreads as input is frequently encountered 
in practice. In this section we therefore discuss CDS pricing and the calibration of 
hazard rate models to observed CDS spreads. 
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Pricing. We consider the following CDS contract. We take the notional to be one, 
so that percentage loss given default and absolute loss given default are the same. 
The premium payments are due at N points in time 0 < tı <--- < ty. Ift > tk, 
the protection buyer pays a premium of size x*(t, — tk—1) at tg, where x* denotes 
the swap spread. After default, no further premium payments are made. If default 
occurs before the maturity date ty of the swap, the protection seller makes a default 
payment of size ô to the buyer at the default time t. In a standard CDS the protection 
buyer pays the protection seller at default the part of the premium that has accrued 
since the last regular premium payment date; here we ignore these accrued premium 
payments to simplify the exposition. 

We use the same set-up as in the analysis of bond pricing in the previous section. 
As a first step we price the payments made by the protection buyer (the so-called 
premium payment leg of the swap) and the payments made by the protection seller 
(the default payment leg) separately, using a generic risk-neutral hazard function 
y2 and a generic spread x. The price of the premium payment leg at t < ty (the 
expected discounted value of the payments) is given by 

x) 


tk 
Ve y2) = £2( y exp (-f rau) 0 — tk—1) Uy, <7} 
t 
=x X Polt, th) (tk — tk-1)Q (T > tk | Hi), (10.39) 


k: th>t 
k: tk>t 


which is easily computed using the formula 


tk 
O(t > tk | Hi) = rs exp (-/ y2(s) as). 
t 


The default payment leg is a typical payment-at-default claim. Using Lemma 10.9 
we obtain 


J T 
vif (y2) = £2(exp (-f r(u) du) lieren | x) 
t 


tn RY 
= tant f y 2(s) exp (-f (r (u) + yeun) du) ds. (10.40) 
t t 


According to market convention the CDS spread x;* quoted for the contract at time 
t (the so-called fair CDS spread x*) is chosen such that the value of the contract 
is equal to zero. Hence x** is defined by the equation VP" (x*; y2) = Vf (y2), 
which gives 


5 fE y 2(s) exp(— fi (ru) + y? (u)) du) ds 
Er: yer POC, th) (te — tk-1) exp(— f* y 2(s) ds) 


prem 


Obviously, x;* depends on the hazard function y 2 as V and yr depend on yÊ. 

Note that in the pricing argument we have neglected the issue of counterparty risk 
and, in particular, the possibility that the protection seller might default before the 
maturity of the CDS. A discussion of counterparty risk for CDS contracts is given 
in Section 17.2. 


(10.41) 


xf = I<} 
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Calibration. Assume now that we observe spreads quoted in the market for one or 
more CDSs on the same reference entity. Under the martingale-modelling approach 
we have to calibrate our model to the available market information: that is, we have 
to determine the implied risk-neutral hazard function y 2, which ensures that the fair 
CDS spreads implied by the model equal the spreads that are quoted in the market. 

Suppose that the market information at time t = 0 consists of the fair spread x* of 
one CDS with maturity ty; the risk-neutral hazard function y 2 is constant, so that, 
for all s > 0, y 2 (s) =y 2 for some y 2 > 0, which we refer to as the risk-neutral 
hazard rate. It follows from (10.39) and (10.40) that the implied risk-neutral hazard 
rate y 2 satisfies the equation 


N t 
x") p00, te) (te — t-10 7 = 672 I oO, De dt. (10.42) 

k=1 0 
Here, the left-hand side equals ve rem cy*, y2) and the right-hand side is obvi- 
ously equal to vef 2). There is a unique implied risk-neutral hazard rate solv- 
ing equation (10.42). This may be seen by first noting that v? “Mx*, 72) is a 
decreasing function of 72 while vef (y2) is an increasing function of 72. More- 
over, ydet (0) = 0, so the value of the premium payments exceeds the value of the 
default payment for small values of 72. On the other hand, as y @ tends to infinity, 
v? rem (x*, p2) converges to zero, so ve BREE p2) < vey Q) for large values 
of y2. 

If one observes spreads for several CDSs on the same reference entity but with 
different maturities, a time-independent risk-neutral hazard function is generally not 
sufficient to calibrate the model to the observed swap spreads. Instead one typically 
uses hazard functions y@(t) that are piecewise constant or piecewise linear. An 
exception occurs in the special case where (1) the spread curve is flat (i.e. all CDSs 
on the reference entity have the same spread x*, independent of the maturity), 
(2) the risk-free interest rate is constant, and (3) the time points tg are equally spaced 
(tk — tk-1 = At for all k). In that case the implied risk-neutral hazard rate y 2 is the 
solution of equation (10.42) in the case where N = 1, that is, the solution of 


At 
x* Atpo(0, Atje 7° A = 672 f ete 7 dr. (10.43) 
0 


For At relatively small (quarterly or semiannual spread payments), a good approx- 
imation to the solution of (10.43) is given by y2 ~ x*/6, i.e. by the ratio of the fair 
swap spread and the percentage loss given default. This approximation is frequently 
used in practice. 

Note, finally, that for most issuers the implied hazard rate is relatively small (of the 
order of a few percentage points). We therefore have the following approximation 
for the one-year default probability: 


Or <1) H=1—-e Fw y? x x*/d, (10.44) 


so the quantity x*/ô can be viewed as a proxy for the risk-neutral one-year default 
probability. 
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10.4.5 P versus Q: Empirical Results 


We have now assembled the necessary technical tools to discuss some of the empiri- 
cal work on the relationship between physical and risk-neutral default probabilities. 
Understanding this relationship is important; it enables market participants to use 
information about historical default probabilities in pricing credit-risky securities. 
Conversely, it allows the use of market quotes for CDSs or defaultable bonds as 
additional inputs in determining historical default probabilities. 

In most empirical studies risk-neutral default probabilities are estimated from 
credit-spread data for CDSs. By comparing these estimates with estimates of the 
physical default probability—obtained, for instance, from the public-firm EDF 
methodology introduced in Section 10.3.3—it is possible to gain some empirical evi- 
dence on the relationship between physical and risk-neutral default probabilities in 
real markets. An extensive empirical study along these lines is found in Berndt et al. 
(2008). The authors carried out a very detailed regression analysis of the observed 
spreads for five-year CDSs against five-year EDFs for a large pool of firms. The 
five-year EDF of a firm with publicly traded stock is an annualized estimate of 
the physical five-year default probability. The computation of EDFs is described in 
detail in Section 10.3.3, and annualization is a way of expressing EDFs for different 
time horizons on a common yearly scale. 

Berndtet al. (2008) begin by estimating a linear model for the relationship between 
the observed swap spread ae į Of firm i at date ¢ and the five-year EDF of that firm 
on the same day, labelled EDF;,;. The model takes the form 


* 
Xii 


=a+BEDF,;+¢;, (Œ, DES, (10.45) 


where S denotes the set of all time points/firms for which there is an observable EDF— 
CDS pair. The model was fitted to a sample of 33 912 EDF—CDS observations for a 
large set of publicly traded US firms in the period December 2000 to December 2004. 
The estimated coefficients were given by œ = 33 bp and £ = 1.6; the R? was 0.73. 

Berndt et al. (2008) propose the following interpretation of this regression result. 
Their model implies that the fair swap spread x* of a firm increases by approximately 
16 basis points for every 10 basis point increase in the five-year EDF of that firm; 
neglecting the intercept, we thus have that x* ,/EDF,,; ~ 1.6. Assuming a fixed loss 
given default ô, we may use the quantity q; = x*,/6 as a proxy for the risk-neutral 
default probability of firm i at time t; moreover, EDF, ; can be viewed as a proxy 
for the physical default probability of firm i at time t. The ratio of risk-neutral to 
historical default probabilities is therefore given approximately by 


* 
ai Yri 


Ih y z~ 1.687. 
Pti  ôEDF;i 


With ô = 0.75 we obtain qr,i/ pt ,i ~ 2.13; higher recovery rates, i.e. smaller values 
of 5, would lead to an even higher estimate for qr,i / pr i- The analysis of Berndt et al. 
(2008) clearly shows that physical and risk-neutral default probabilities can differ 
substantially, and care must be taken to distinguish between the two concepts. 
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Figure 10.6. Ratio of one-year risk-neutral and historical default probabilities 
for Vintage Petroleum, as estimated by Berndt et al. (2008). 


A careful inspection of the EDF-CDS relationship shows that the simple linear 
model (10.45) might not be appropriate for a number of reasons. First, the intercept 
of 33 basis points is implausible, as it would imply that even for a firm with historical 
default probability p close to zero the swap spread is still of the order of 30 basis 
points. Second, Berndt et al. (2008) found that the ratio x*/EDF; varies between 
industry sectors—reflecting different recovery rates for different industries—and 
over time, as is illustrated in Figure 10.6. Third, there seems to be some concavity in 
the relationship between swap spreads and EDFs; in particular, the ratio x“ ¡/EDF; i 
is higher for high-quality firms with low EDF values than for low-quality firms. 
For these reasons the authors go on to consider more refined logarithmic regression 
models that fit the data significantly better. 


Notes and Comments 


Hazard rate models are a common tool in credit risk and survival analysis: see, for 
example, Bielecki and Rutkowski (2002) or, for a general introduction to survival 
analysis, the classical textbook by Cox and Oakes (1984). Further useful textbooks 
are Fleming and Harrington (2005), Marshall and Olkin (2007) and Aalen, Borgan 
and Gjessing (2010). 

The fundamental theorems of asset pricing and the conceptual underpinnings 
of risk-neutral pricing are discussed in most textbooks on mathematical finance: 
see, for example, Duffie (2001), Björk (2004), Shreve (2004b) and Delbaen and 
Schachermayer (2006). The term martingale modelling was coined in Björk (2004) 
in the context of default-free short-rate models. In recent years a number of inter- 
esting approaches to the risk management of derivative securities in incomplete 
markets have been developed. Quadratic hedging approaches were first developed 
by Föllmer and Sondermann (1986) and Föllmer and Schweizer (1991); Schweizer 
(2001) is an excellent survey; utility-based approaches to pricing and hedging in 
incomplete markets are discussed in Delbaen et al. (2002) and Becherer (2004), and 
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the latter paper explicitly considers applications of utility-based hedging strategies 
to credit risk models. Papers dealing with dynamic hedging and market incom- 
pleteness in credit risk models include Bielecki, Jeanblanc and Rutkowski (2004), 
Bielecki, Jeanblanc and Rutkowski (2007), Frey and Backhaus (2010) and Cont and 
Kan (2011). 

A detailed analysis of CDS pricing can be found in many sources; a good reference 
is Schonbucher (2003). Theoretical results on the relationship between physical and 
risk-neutral default probabilities were obtained by Artzner and Delbaen (1995) and 
Jarrow, Lando and Yu (2005). In their paper, Berndt et al. (2008) go beyond the 
regression analysis presented in our text and estimate a full time-series model for the 
joint evolution of risk-neutral and actual default intensities. Further empirical studies 
of the relationship between actual and risk-neutral default probabilities include Fons 
(1994), Bohn (2000), Driessen (2005) and Huang and Huang (2012). These results 
largely corroborate the findings of Berndt et al. (2008). 


10.5 Pricing with Stochastic Hazard Rates 


In the models with deterministic hazard functions discussed in Section 10.4, the 
only risk factor affecting a defaultable bond or a CDS is default risk. Hence in these 
models credit spreads evolve deterministically prior to default, which is clearly 
unrealistic. Moreover, it is not possible to price options on defaultable bonds or 
CDSs in such models. In this section we consider models where the hazard function 
is replaced by a stochastic hazard process. In mathematical terms this leads to the 
notion of doubly stochastic random times, which is discussed in Section 10.5.1. In 
Section 10.5.2 we derive pricing formulas for certain building blocks that can be 
used to value many important credit-risky securities. Applications of these formulas 
are studied in Section 10.5.3. 


10.5.1 Doubly Stochastic Random Times 


We now consider a situation where additional information affecting the distribution 
of the random time t is available. Formally, we represent this additional information 
by some filtration (F;) on the underlying probability space (2, F, P). In credit 
risk models this information is typically generated by some background process 
(W,) representing, for instance, the risk-free interest rate or various measures of 
economic activity, so that F; = o ({W: s < t}). 

Consider some random time t on (2, F, P) with P(t > 0) = 1 and denote by 
Y, = Ier the associated jump indicator and by (#,) the filtration generated by 
(Y;) (see equation (10.26)). We introduce a new filtration (G;) by 


Ge =F, V Hi, t 20, (10.46) 


meaning that %; is the smallest o-algebra that contains F; and #;. We will fre- 
quently use the notation (%1) = (F;) V (#,) below. The filtration (G,) contains 
information about the background processes and the occurrence or non-occurrence 
of default up to time f, and thus typically corresponds to the information available 
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to investors. Obviously, t is an (#f;) stopping time and hence also a (G;) stopping 
time. Note, however, that we do not assume that t is a stopping time with respect 
to the background filtration (¥;). 

Doubly stochastic random times are a straightforward extension of the models 
considered in Section 10.4 to the present set-up with additional information. 


Definition 10.10 (doubly stochastic random time). A random time T is said to 
be doubly stochastic if there exists a positive (F;)-adapted process (y;) such that 
= h ys ds is strictly increasing and finite for every t > O and such that, for all 
t >20, 


t 
P(t >t | Foo) = exp (-f Vs as). (10.47) 
0 
In that case (y+) is referred to as the (F;,)-conditional hazard process of t. 


In (10.47) Fœ denotes the smallest o-algebra that contains F; for all t > O: that 
is, Fon = o(Uss0 F). Conditioning on Fo thus means that we know the past and 
future economic environment and in particular the entire trajectory (ys (@))s>0 of the 
hazard rate process. Hence (10.47) implies that, given the economic environment, 
t is a random time with deterministic hazard function given by the mapping s => 
¥s(@). The term doubly stochastic obviously refers to the fact that the hazard rate 
at any time is itself a realization of a stochastic process. In the literature, doubly 
stochastic random times are also known as conditional Poisson or Cox random times. 
Note, finally, that (10.47) implies that P(t < t | Foo) is ¥;-measurable, so we have 
the equality 

P(t <t | Foo) = P(t <t| F). (10.48) 


In the next lemma we give an explicit construction of doubly stochastic random 
times. This construction is very useful for simulation purposes. 


Lemma 10.11. Let X be a standard exponentially distributed rv on (2, F, P) 
independent of Fog, i.e. P(X < t | Fo) = 1 — e™ for allt > 0. Let (y,) be a 
positive (¥;)-adapted process such that T; = h ys ds is strictly increasing and finite 
for every t > 0. Define the random time t by 


rr (X) =inf{t > 0: Tl > X}. (10.49) 
Then t is doubly stochastic with (¥;)-conditional hazard rate process (y;). 


Proof. Note that by definition of t it holds that {t > t} = {I}; < X}. Since T; is 
Foo-measurable and X is independent of Foo, we obtain 


P(t >t| Foo) = PU < X | Fea) =e", 


which proves the claim. 


Lemma 10.11 has the following converse. 


Lemma 10.12. Lett be a doubly stochastic random time with ( F;)-conditional haz- 
ard process (y+). Denote by I; = Í ys ds the (F;)-conditional cumulative hazard 
process oft and set X := I. Then the rv X is standard exponentially distributed 
and independent of Fœ, and t = I’ (X) almost surely. 
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Figure 10.7. A graphical illustration of Algorithm 10.13: X ~ 0.44, t ~ 6.59. 


Proof. Since (J+) is strictly increasing by assumption, the relation t = l“ (X) is 
clear from the definition of X. To prove that X has the correct distribution we argue 
as follows: 


P(X < t | Foo) = PT: < t | Foo) = P(t <S FO) | Foo). 


Since t is doubly stochastic, the last expression equals 1 — exp(—- r (r€ (t))) = 
‘as I” is continuous and strictly increasing by assumption. This shows that 
X is independent of Fæ and that it is standard exponentially distributed. 


l— e7 


Lemma 10.11 forms the basis for the following algorithm for the simulation of 
doubly stochastic random times. 


Algorithm 10.13 (univariate threshold simulation). 


(1) Generate a trajectory of the hazard process (j;). References for suitable sim- 
ulation approaches are given in Notes and Comments. 


(2) Generate a unit exponential rv X independent of (y+) (the threshold) and set 
t = I’ (X), this step is illustrated in Figure 10.7. 


Moreover, Lemmas 10.11 and 10.12 provide an interesting interpretation of dou- 
bly stochastic random times in terms of operational time. For a given (F;)-adapted 
hazard process (y+), define a new timescale (operational time) by the associated 
cumulative hazard process I; = J ys ds, so that c units of operational time cor- 
respond to l“ (c) units of real time. Take a standard exponential rv X indepen- 
dent of Fæ and measure time in units of operational time. The associated calen- 
dar time t := I“ (X) is then doubly stochastic by Lemma 10.11. Conversely, by 
Lemma 10.12, if we take a doubly stochastic random time Tt, the associated opera- 
tional time X := I, is standard exponential, independent of Foo. The notion of oper- 
ational time plays an important role in insurance mathematics (see Section 13.2.7). 
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Intensity of doubly stochastic random times. We have seen in Proposition 10.7 that 
the jump indicator process (Y;) can be turned into an (#¢;)-martingale if we subtract 
the process i y(s) ds, where t A Tt is a shorthand notation for min{t, t}. We now 
generalize this result to doubly stochastic random times. 


Proposition 10.14. Lett be a doubly stochastic random time with (¥;)-conditional 
hazard process (y;). Then M, := Y, — a Ys ds is a (G,)-martingale. 


Proof. Define a new artificial filtration (Gr) by G, = Fo V Hı, and note that 
ĝo = Foo and r C Gi, for all t. As explained above, given Foo, T is a random time 
with deterministic hazard rate. Proposition 10.7 implies that M; := Y, — h ^T yds 
is a martingale with respect to (G;). Since (M+) is (G;)-adapted and 9r C Qr, (Mr) is 
also a martingale with respect to (G;). 


Finally, we relate Proposition 10.14 to the popular notion of the intensity of a 
random time. 


Definition 10.15. Consider a filtration (G;) and a random time t with (G,)-adapted 
jump indicator process (Y;). A non-negative (G;)-adapted process (à+) is called a 
(G1)-intensity process of the random time t if M; := Y; — oi As ds is a (Gr)- 
martingale. 


In reduced-form credit risk models, (à+) is usually called the default intensity of 
the default time t. It is well known that the intensity (à+) is uniquely defined on 
{t < t}. This is an immediate consequence of general results from stochastic calcu- 
lus concerning the uniqueness of semimartingale decompositions (see, for example, 
Chapter 2 of Protter (2005)). Using the terminology of Definition 10.15, we may 
restate Proposition 10.14 in the following form: “the (%;)-intensity of a doubly 
stochastic random time T is given by its (¥;)-conditional hazard process (j;)”. At 
this point a warning is in order: there are random times that admit an intensity in the 
sense of Definition 10.15 that are not doubly stochastic and for which the pricing 
formulas derived in Section 10.5.2 below do not hold. 


Conditional expectations. Next we discuss the structure of conditional expecta- 
tions with respect to the full-information o-algebra %;; these results are crucial for 
the derivation of pricing formulas in models with doubly stochastic default times. 


Proposition 10.16. Let t be an arbitrary random time (not necessarily doubly 
stochastic) such that P(t > t | ¥;) > O for allt > 0. We then have for every 
integrable rv X that 

E(lir>1}X | F) 


E(Ir>t}X | Gr) = Heo Des 11K) 
t 


Note that Proposition 10.16 allows us to replace certain conditional expectations 
with respect to , by conditional expectations with respect to the background infor- 
mation ¥;. The result is also known as the Dellacherie formula. In the special case 
where the background filtration is trivial, i.e. F; = {@, 2} for all t > 0, Proposi- 
tion 10.16 reduces to Lemma 10.6. 
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Proof. Standard measure-theoretic arguments show that for every %;-measurable 
rv X there is some ¥;-measurable rv X such that Xan = Xlesi. This is quite 
intuitive since prior to default all information is generated by the background fil- 
tration (F;); a formal proof is given in Section 5.1.1 of Bielecki and Rutkowski 
(2002). Now E (Iir>1}X | Yr) is $;-measurable and zero on {t < t}. There is there- 
fore an ¥;-measurable rv Z such that E(rsyX | $1) = [sn Z. Taking conditional 
expectations with respect to F, and noting that F; C $: yields 


EUesyX | F) = P(e > t | FZ. 


Hence Z = E(rsX | F:)/P(t > t | Fi), which proves the lemma. 


Corollary 10.17. Let T > t and assume that t is doubly stochastic with hazard 
process (y;). If the rv X is integrable and Fr -measurable, we have 


A 
Es) X | Gr) = Iron) E (exp (-f Ys as) | Fi). 
t 


Proof. Let X := I>) X. Since X = Iir>1}X (as T > t), Proposition 10.16 yields 
~ t ~ 
E(rs1}X | Gi) = EUs X | Gr) = Ise ds Be ry X | Fi), 
where we have used the fact that 


t 
P(t>t| Fi) =exn(- f nas). 
0 


Since X is #7-measurable, 


T 
E(r>1)X | F) = E(XP >T | Fr) | F) = e(že(- f 1.05) | Fi), 
0 


and the result follows. 


Corollary 10.17 will be very useful for the pricing of various credit-risky securities 
in models with doubly stochastic default times. Moreover, the corollary implies that 
in the above setting y, gives a good approximation to the one-year default probability. 
This follows by setting T = t + 1 and X = 1 to obtain 


t+1 
P(t >t+1| $1) = Hon (exp (- 1 Ys as) | Fi). (10.50) 
t 


Now assume that t > ¢ and that the hazard rate remains relatively stable over 
the time interval [t,t + 1]. Under these assumptions the right-hand side of (10.50) 
is approximated reasonably well by e~” and, if y, is small, the one-year default 
probability satisfies 


P(r StH1I| 9A x1- e7” ry. (10.51) 
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10.5.2 Pricing Formulas 


The main result of this section concerns the pricing of three types of contingent 
claims that can be used as building blocks for constructing the pay-off of many 
important credit-risky securities. We will show that, for a default time that is doubly 
stochastic, the computation of prices for these claims can be reduced to a pricing 
problem for a corresponding default-free claim if we adjust the interest rate and 
replace the default-free interest rate r; by the sum R; = r; + yr of the default-free 
interest rate and the hazard rate of the default time. 


The model. We consider a firm whose default time is given by a doubly stochastic 
random time as in Section 10.5.1. The economic background filtration represents the 
information generated by an arbitrage-free and complete model for non-defaultable 
security prices. More precisely, let (2, F, (F+), Q) denote a filtered probability 
space, where Q is the equivalent martingale measure. Prices of default-free securities 
such as default-free bonds and the default-free rate of interest (r;) are (F;)-adapted 
processes; B; = exp( h rs ds) models the default-free savings account. 

Let t be the default time of some company under consideration and let Y; = Igt} 
be the associated default indicator process. As before we set #, = o ({Ys: s < t}) 
and b: = F; V Hr, we assume that default is observable and that investors have 
access to the information contained in the background filtration (F;), so that the 
information available to investors at time f is given by $,. We consider a market 
for credit products that is liquid enough that we may use the martingale-modelling 
approach, and we use Q as the pricing measure for defaultable securities. According 
to (10.30), the price at time ¢ of an arbitrary, non-negative, 9.7 -measurable contingent 
claim H is therefore given by 


T 
H, = EO (exp (-/ rs as) | f); (10.52) 
t 


Finally, we assume that, under Q, the default time t is a doubly stochastic random 
time with background filtration (¥;) and hazard process (y+). This latter assumption 
is crucial for the results that follow. 


Definition 10.18. We introduce the following building blocks. 


(i) A survival claim, i.e. an Fr-measurable promised payment X that is made 
at time T if there is no default; the actual payment of the survival claim 
equals XI{737}. 


(ii) A risky dividend stream. Here, we consider a promised dividend stream given 
by the (F;)-adapted rate process vs, 0 < s < T. The payments of a risky 
dividend stream stop when default occurs, so that the actual payments of this 
building block are given by the dividend stream with rate v,1j734,0 <t < T. 

Gii) A payment-at-default claim of the form Z; I; <7}, where Z = (Z;);>0 is an 
(F¥;)-adapted stochastic process and where Z+; is short for Z;(~)(@). Note 
that the payment is made directly at t, provided that rt < T, where T is the 
maturity date of the claim. 
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Recall from Section 10.4.3 that defaultable bonds can be viewed as portfolios 
of survival claims and payment-at-default claims. Credit default swaps can also be 
written as a combination of these claims, as will be shown in Section 10.5.3. A further 
example is provided by option contracts that are subject to counterparty risk. For 
concreteness we consider a call option on some default-free security (S;). Denote 
the exercise price by K and the maturity date by T and suppose that if the writer 
defaults at time t < T, then the owner of the option receives a fraction (1 — ô+) 
of the intrinsic value of the option at the time of default. This can be modelled as 
a combination of the survival claim (Sr — K)T L {r>T} and the payment-at-default 
claim (1 — 67)(S; — K)" Ir<ry. 


Pricing results. In the following theorem we show that the pricing of the build- 
ing blocks introduced in Definition 10.18 can be reduced to a pricing problem in 
a default-free security market model with investor information given by the back- 
ground filtration (¥;) and with adjusted default-free interest rate. 


Theorem 10.19. Suppose that, under Q, t is doubly stochastic with background 

filtration (¥;) and hazard process (y;). Define Rs := rs + ys. Assume that the rvs 
T T f T $ 

exp(— f, rsds)|X|, f; |vs|exp(— J? ru du) ds and f; |Zsys|exp(— f? Ry du) ds 

are all integrable with respect to Q. Then the following identities hold: 


T 
EO (exp (-f rs as) eon | s) 
t 
T 
= Ire) E2 (exp (-f Rs as) x | z), (10.53) 
t 
T s 
ze( f Vs Itz} EXP (-f Ty au) ds ir) 
t t 
T s 
= tron Bf Vs eXp (-f Ry au) ds 
t t 
T 
EO (Iyar exp (-f f; as) Z, s) 
t 
T s 
= ten BC f zna(- f R, du) ds 
t t 


Proof. The integrability conditions ensure that all conditional expectations are well 
defined. We start with the pricing formula (10.53) for the vulnerable claim. Define 
the F-measurable rv X := exp(— J rs ds)X. Using Corollary 10.17 with s = T 
and I) = I y, ds we find that 


ri), (10.54) 


Fi). (10.55) 


EL (X lu>T) | Gr) = Iron EL (expr — T))X | Fi). 


Noting that Ir — = ff ds and using the definition of X, it follows that the right- 
hand side ae hewn Ee (exp(— JE Rs ds)X | ¥;). The pricing formula (10.54) 
follows from (10.53) and the Fubini Theorem for conditional expectations. Finally, 
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we turn to (10.55). Lemma 10.16 implies that 


2 
£2(teanexo(- f reds) Zeliecr) s) 
t 


E? (sn exp(— i Fs ds)Zr Ir<T)} | Fi) 
P(t >t | Ft) i 


= Its (10.56) 


Now note that 
t 
P(t <t|Fr)=1 -ap(- f nas), 
0 


so the conditional density of t given Fr equals frifr (t) = y; exp(— fo ys ds). 
Hence 


T 
£2(themn exp (- / ls ds) Zetec Fr) 
t 
T sS sS 
= f exp (- | Tu au) Zs Vs EXP (- f Vu au) ds 
t t 0 
t T S 
= exp (-f Vu au) l Zs Vs EXP (-/ Ry au) ds. 
0 t t 


Using iterated conditional expectations we obtain the formula 


E 
Ee (t> exp (-f fs ds ) Zetec 7) 
t 
t T s 
= exp (-f Vu au) £°( f Zs Vs EXP (-f Ry au) ds 
0 t $ 


and the identity (10.55) follows from (10.56). 


z), 


10.5.3 Applications 


Credit default swaps. We extend our analysis of Section 10.4.4 and discuss the 
pricing of CDSs in models where the default time is doubly stochastic. This allows 
us to incorporate stochastic interest rates, recovery rates and hazard rates into the 
analysis. 

We quickly recall the form of the payments of the CDS contract. As in our previous 
analysis, the premium payments are due at N points in time 0 < tı <--- < ty; ata 
pre-default date tg, the protection buyer pays a premium of size x(t, — tk—1), where 
x denotes the swap spread in percentage points (again we take the nominal of the 
swap to be one). If t < ty, the protection seller makes a default payment of size ô+ 
to the buyer at the default time t, where the percentage loss given default is now 
a general (F;,)-adapted process. Using Theorem 10.19, both legs of the swap can 
be priced. The regular premium payments constitute a sequence of survival claims. 
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Using (10.53) the fair price of the premium leg at t = 0 is 


N n 
yPrem,1 — ) £2(exp (- [ Ty du) x, = Dline) 
0 

k=1 


N mn 
=x Js- -DE (exp (-f Ry au)), 


k=1 
The default payment leg is a payment-at-default claim with Z; = ôs and maturity ty, 
so its value is given by Vf = EL ôs ys exp(— f R,, du) ds). We have therefore 
reduced the pricing of credit default swaps to a pricing problem in the default-free 
world. Methods for solving this problem will be discussed in the next section. 


Recovery of market value. Recovery of market value, abbreviated RM, is an alter- 
native recovery model for defaultable bonds and other credit-risky securities that 
has been put forward by Duffie and Singleton (1999); its main virtue is the fact 
that it leads to particularly simple pricing formulas. Consider a claim whose pay- 
off consists of the survival claim X and a recovery payment at the default time. 
Under the RM hypothesis it is assumed that this recovery payment is given by 
(1 — ôt) Vr lr<r}, where the (F;)-adapted process (8+) € (0, 1) gives the percent- 
age loss given default of the claim and where the (F,)-adapted process (V;) gives 
the pre-default value of the claim. Note that this is a recursive definition, as the 
pre-default value at time ¢ also depends on the form of the recovery payments in the 
time period (t, T]. Nonetheless, the following result can be established. 


Proposition 10.20. Suppose that, under Q, t is doubly stochastic with hazard rate 
process (y;). Suppose, moreover, that X is integrable and that the RM assumption 
holds. Then the pre-default value process (V;) is uniquely determined and is given 


by 
T 
V, = Ee (exp (-f (r; + an) ds) X | ri), O<t<T. (10.57) 
t 


Note that for 5, = 1 the claim is a standard survival claim; in that case, 
(10.57) reduces to the formula (10.53). On the other hand, for 6; = O the claim is 
essentially default free; in that case, (10.57) reduces to the standard pricing formula 
for the claim X in a default-free security market model. For a proof of Proposi- 
tion 10.20 we refer to the references given in Notes and Comments. 


Credit spreads and hazard rates. With doubly stochastic default times the risk- 
neutral hazard process (j;) and the credit spread 
1 


eE are 


(in pı (t, T) — In pot, T)) 


of defaultable bonds are closely related. Analytic results are most easily derived for 
the instantaneous credit spread given by 


ð 
c(t, t) = lim ¢(t, T)= OF (ln pi (t, T) — In po(t, T)). (10.58) 
as T=t 
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Assume that t > t, so that p(t, t) = po(t, t) = 1. We therefore obtain 


oT 


ð 
ln pıt, T) = =| pitt, T), (10.59) 
T=t oT T=t 


and similarly for po(t, T). To compute the derivative in (10.59) we need to dis- 
tinguish between the different recovery models. Under the RM hypothesis we can 
apply Proposition 10.20 with X = 1. Exchanging expectation and differentiation 


we obtain 
T 
pit, T) = -eo( $ exp (-/ (rs + snas) | wi) 
T=t t 


= r, F beyn (10.60) 


T=t 


Applying (10.60) with 6; = 0 yields 


aT polt, T) =", 


T=t 


so that c(t, t) = 6;), i.e. the instantaneous credit spread equals the product of the 
hazard rate and the percentage loss given default, which is quite intuitive from an 
economic point of view. Under the RF recovery model, p1 (t, T) is given by the sum 
of the price of a survival claim r>r} and a payment at default of size (1 — ôr). 
Equation (10.60) with 6, = 1 shows that the derivative with respect to T of the 
survival claim at T = t is equal to —(r; + yr). For the recovery payment we get 


0 T s 
(| cl —doexp (— f Ru du ) ds 
T=t t t 


aT 
Pilt, T) =r; + yi — (1 — ô) yi = ri +6, 
T=t 


F.) = (1 — 4). 


Hence 


aT 
so that cı (t, t) is again equal to 6,y,. An analogous computation shows that we also 


have cı (t,t) = 6,y% under RT. However, for T — t > 0, the credit spread corres- 
ponding to the different recovery models differs, as is illustrated in Section 10.6.3. 


Notes and Comments 


The material discussed in this section is based on many sources. We mention in par- 
ticular the books by Lando (2004) and Bielecki and Rutkowski (2002). The text by 
Bielecki and Rutkowski is more technical than our presentation; among other things 
the authors discuss various probabilistic characterizations of doubly stochastic ran- 
dom times. The threshold-simulation approach for doubly stochastic random times 
requires the simulation of trajectories of the hazard process. An excellent source for 
simulation techniques for stochastic processes is Glasserman (2003). 

More general reduced-form models where the default time t is not doubly stochas- 
tic are discussed, for example, in Kusuoka (1999), Elliott, Jeanblanc and Yor (2000), 
Bélanger, Shreve and Wong (2004), Collin-Dufresne, Goldstein and Hugonnier 
(2004) and Blanchet-Scalliet and Jeanblanc (2004). 
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Theorem 10.19 is originally due to Lando (1998); related results were obtained 
by Jarrow and Turnbull (1995) and Jarrow, Lando and Turnbull (1997). Proposi- 
tion 10.20 is due to Duffie and Singleton (1999); extensions are discussed in Becherer 
and Schweizer (2005). An excellent text for the overall mathematical background 
is Jeanblanc, Yor and Chesney (2009). 

The analogy with default-free term structure models makes the reduced-form 
models with doubly stochastic default times relatively easy to apply. However, some 
care is required in interpreting the results and applying the linear pricing rules for 
corporate debt that the models imply. In particular, one must bear in mind that in 
these models the default intensity does not explicitly take into account the structure 
of a firm’s outstanding risky debt. A formal analysis of the effect of debt structure 
on bond values is best carried out in the context of firm-value models, where the 
default is explicitly modelled in terms of fundamental economic quantities. A good 
discussion of these issues can be found in Chapter 2 of Lando (2004). 


10.6 Affine Models 


In order to apply the pricing formulas for doubly stochastic random times obtained 
in Theorem 10.19 we need effective ways to evaluate the conditional expectations on 
the right-hand side of equations (10.53), (10.54) and (10.55). In most models, where 
default is modelled by a doubly stochastic random time, (r;) and (y;) are modelled 
as functions of some p-dimensional Markovian state variable process (W;) with state 
space given by the domain D C R?, so that the natural background filtration is given 
by (F;) = o({W: s < t}). Moreover, R; := r; + yı is of the form R; = R(Y%) 
for some function R: D C RP — R+. We therefore have to compute conditional 
expectations of the form 
7) 


T T sS 
E (exp (-f Rew) as) er) f h(W;) exp (-f Rau) ds 
$ t t 
(10.61) 


for generic functions g, h: D > R+. Since (WY) is a Markov process, this condi- 
tional expectation is given by some function f(t, W,) of time and the current value 
W, of the state variable process. It is well known that under some additional regu- 
larity assumptions the function f can be computed as solution of a parabolic par- 
tial differential equation (PDE)—this is the celebrated Feynman—Kac formula. The 
Feynman—Kac formula provides a way to determine f using analytical or numerical 
techniques for PDEs. In particular, it is known that in the case where (W;,) belongs to 
the class of affine jump diffusions (see below), R is an affine function, g(w) = env 
for some u € R? and h = 0, the function f takes the form 


f(t, Y) = exp(a(t, T) + Bit, TYY) (10.62) 


for deterministic functions a: [0, T] —> R and £: [0, T] —> RP; moreover, a and 
B are determined by a (p + 1)-dimensional ordinary differential equation (ODE) 
system that is easily solved numerically. Models based on affine jump diffusions 
and an affine specification of R are therefore relatively easy to implement, which 
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explains their popularity in practice. A relationship of the form (10.62) is often 
termed an affine term structure, as it implies that continuously compounded yields 
of bonds at time ż are affine functions of %. 

In this section we discuss these results. We concentrate on the case where the state 
variable process is given by a one-dimensional diffusion; extensions to processes 
with jumps will be considered briefly at the end. 


10.6.1 Basic Results 


The PDE characterization of f. We assume that the state variable process (W%) is 
the unique solution of the SDE 


dW, = (W) dt +o(%)dW,, W=wWeD, (10.63) 


with state space given by the domain D C R. Here, (W,) is a standard, one- 
dimensional Brownian motion on some filtered probability space (2, F, (F;), P), 
and u and o are continuous functions from D to R and D to R+, respectively. The 
next result shows that the conditional expectation (10.61) can be computed as the 
solution of a parabolic PDE. 


Lemma 10.21 (Feynman-Kac). Consider generic functions R, g,h: D > R+. 
Suppose that the function f : [0, T] x D — R is continuous, once continuously 
differentiable in t and twice continuously differentiable in y on [0, T) x D, and 
that f solves the terminal-value problem 


fit nw) fy + ioy) fir +h) = RWS (t, Y) € 10,7) x D, 


fT, y) =80), YeD. 
(10.64) 
If f is bounded or, more generally, ifmaxogi<r f(t, Y) < CU+ y?) fory € D, 
then 


T 
E(exn(- f RCW) ds) @04) 
t 
T s 
+f h(W,) exp (-f ROm) i) ds 
t t 


The Feynman—Kac formula is a standard result of stochastic calculus and it is 
discussed in many textbooks on stochastic processes and financial mathematics, so 
we omit the proof (references are given in Notes and Comments). 


7) = f(t, W). (10.65) 


Affine term structure. We begin with the case h = 0; in financial terms this means 
that we concentrate on survival claims. The following assumption ensures that for 
h = 0 the solution of the PDE (10.64), with terminal condition g(w) = ett, 
uw < 0, for yY € D, is of the form (10.62), so that we have an affine term structure. 
Note that g = 1 for u = 0; this is the appropriate terminal condition for pricing 
zero-coupon bonds. 
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Assumption 10.22. R, u and o? are affine functions of Y, i.e. there are constants 
p°, p!, k?, k!, h? and h! such that R(W) = p? + po'y, uw) = k? +k! w and 
o? (Y) = h? + h'w. Moreover, for all Yy € D we have h? +h'w > 0 and 
po + piv 209. 


Fix some T > 0. We try to find a solution of (10.64) of the form f (t, Y) = 
exp(a(t, T) + (t, T)w) for continuously differentiable functions a(-,7) and 
C,T). As f(T, wv) = el’, we immediately obtain the terminal conditions 
a(T,T) = 0, B(T,T) = u. Denote by a’(-, T) and f’(-, T) the derivatives of 
a and £ with respect to t. Using the special form of f we obtain that 

f= +B Wf, fr=Bf and fyy = Bf. 
Hence, under Assumption 10.22 the PDE (10.64) takes the form 
(a! + BY) F +R +k WBF + h + hl) p? f = (+ ol wf. 
Dividing by f and rearranging we obtain 
a’ + KB + Sh°B? — p° + (B+ k'B + 5h'B* — po'y =0. 

Since this equation must hold for all y € D, we obtain the following ODE system: 

Bt, T) = p' —k'BG,T)— 5h T), BT,T) =u, (10.66) 

a(t, T) = p? — k’ b(t, T) — 4h? B70, T), a(T,T) =0. (10.67) 
The ODE (10.66) for £(-, T) is a so-called Ricatti equation. While explicit solu- 
tions exist only in certain special cases, the ODE is easily solved numerically. The 


ODE (10.67) for w(-, T) can be solved by simple (numerical) integration once B 
has been determined. Summing up, we have the following proposition. 


Proposition 10.23. Suppose that Assumption 10.22 holds, that the ODE system 
(10.66), (10.67) has a unique solution (a, 6) on [0, T], and that there is some C 
such that p(t, T)W < C forallt € [0, T], Y € D. Then 


T 
E (exp (-f R(Y) as en 
t 


Proof. The result follows immediately from Lemma 10.21, as our assumption on 6 
implies that fit, y) =exp(a(t, T) + (t, T)w) is bounded. 


Fi) = expla (t, T) + (t, T)%). 


10.6.2 The CIR Square-Root Diffusion 


A very popular affine model is the square-root diffusion model proposed by Cox, 
Ingersoll and Ross (1985) as a model for the short rate of interest. In this model (Y%) 
is given by the solution of the SDE 


dY, =xK(O-W,)dt+o/%dw,, M=v>0, (10.68) 


for parameters K, 0, o > 0 and state space D = [0, 00). Clearly, (10.68) is an affine 
model in the sense of Assumption 10.22; the parameters are given by k? = «ð, 
k! = —«, h? = Oandh! = 0°. 
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It is well known that the SDE (10.68) admits a global solution (see Notes and 
Comments for a reference). This issue is non-trivial since the square-root function 
is not Lipschitz and since one has to ensure that the solution remains in D for all 
t > 0. Note that (10.68) implies that (W;) is a mean-reverting process: if W, deviates 
from the mean-reversion level 0, the process is pulled back towards 6. Moreover, if 
the mean reversion is sufficiently strong relative to the volatility, trajectories never 
reach zero. More precisely, let to(W) := inf{t > 0: W, = 0}. It is well known 
that for xô > ło? one has P(to(W) < co) = 0, whereas for KO < ło? one has 
P(to(¥) < œ)= 1. 

In the CIR square-root model the Ricatti equations (10.66) and (10.67) can be 
solved explicitly. Using Proposition 10.23 it can be computed that 


T 
(exp (-f (0° + p! W) as) | 7) = exp(a(T — t) + B(T —1)%), 
t 


with 
—2p! (e7 — 1 
a lea BE (10.69) 
y =K +e (y +k) 
6 Qyet(VtK)/2 
a(t) = —p!r +2 In ( Ee ), (10.70) 
o? Ky Ke +er(y +x) 


andt := T —t, y := yK? + 202 ¢!. These formulas are the key to pricing bonds in 
models where the risk-free short rate and the default intensities are affine functions 
of independent square-root processes, as is shown in the next example. 


Example 10.24 (a three-factor model). We now consider the pricing of zero- 
coupon bonds in a three-factor model similar to models that are frequently used in 
the literature. We assume that W, = (W_1, Y%.2, Y%_3)’ isa vector of three independent 
square-root diffusions with dynamics dW ; = ki (6; —W,;)dt+ Siy Dri dW, ; for 
independent Brownian motions (W;,,;),i = 1, 2, 3. The risk-free short rate of interest 
is given by 7, = ro + %2 — %,1 for a constant ro > 0; the hazard rate of the 
counterparty under consideration is given by y; = 71W%,1 + W,,3 for some constant 
yı > 0. This parametrization allows for negative instantaneous correlation between 
(r) and (yr), which is in line with empirical evidence. Note, however, that this 
negative correlation comes at the expense of possibly negative risk-free interest 
rates. In this context the price of a default-free zero-coupon bond is given by 


polt, T) 
s) 


T 
= (ex (-/ rs as) 
t 
T T 
= eT e (exp (-f Wo as) | ¥,) E(exo( f Hads) | Fi), 
t t 


(10.71) 


where we have used the independence of (4,1), (W%,2), (W%,3). Each of the terms in 
(10.71) can be evaluated using the above formulas for œ and 6 (equations (10.69) 
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and (10.70)). Assuming that we have recovery of treasury in default (see Sec- 
tion 10.4.3) and a deterministic percentage loss given default 5, we find that the 
price of a defaultable zero-coupon bond is given by 


T 
pilt, T) = (1 —8)polt, T) + sE (exp (-f AEK as) | z); 
t 


By definition of r; and y the last term on the right-hand side equals 


T 
sE(exp (-f (ro + M1 — IW + Wot W 3) as) | ri), 
A 


which can be evaluated in a similar way to the evaluation of expression (10.71). In 
the next section we will show how to deal with more complicated recovery models, 
such as recovery of face value. 


10.6.3 Extensions 


A jump-diffusion model for (Y%). We briefly discuss an extension of the basic 
model (10.63), where the economic factor process (Y%) follows a diffusion with 
jumps. Adding jumps to the dynamics of (Y%) provides more flexibility for mod- 
elling default correlations in models with conditionally independent defaults (see 
Section 17.3.2), and it also leads to more realistic credit spread dynamics. 

In this section we assume that (WY) is the unique solution of the SDE 


dW, = (WY, dt +o(W,) dW, +dZ,, W=weD. (10.72) 


Here, (Z;) is a pure jump process whose jump intensity at time f is equal to A“ (YW) 
for some function 44: D —> R, and whose jump-size distribution has df v on R. 
Intuitively this means that, given the trajectory (Y%(w));>0 of the factor process, 
(Z;) jumps at the jump times of an inhomogeneous Poisson process (see Sec- 
tion 13.2.7) with time-varying intensity AŽ (t, W;); the size of the jumps has df v. 

Suppose now that Assumption 10.22 holds, and that AŽ (y) = 1° + I'w for 
constants 1°, 7! such that 27 (Y) > Ofor all y € D. In that case we say that (¥,) fol- 
lows an affine jump diffusion. For x € R denote by (x) = fg e™™ dv(y) € (0, œ] 
the extended Laplace-Stieltjes transform of v (with domain R instead of the 
usual domain [0, 0o)). Consider the following extension of the ODE system 
(10.66), (10.67): 


B'(t,T) = p! — k' Bt, T) — 5h' Br, T) -1'(O(-B(t,T)) 1), (10.73) 
a'(t, T) = p? — k’ b(t, T) — sh B? (t, T) — POCELE, T)) — 1), (10.74) 


with terminal condition (T, T) = u for some u < 0 and a(T, T) = 0. Suppose 
that the system described by (10.74) and (10.73) has a unique solution œ, 6 and 
that 6(t, T)w < C for all t € [0, T], Y € D (for 1? 4 Oor p Æ 0 this implicitly 
implies that }(—B(t, T)) < oo forall t+). Define fit, y) = expla (t, T)+ A(t, T)w). 
Using similar arguments to those above it can then be shown that the conditional 
expectation E (exp(— JE R (W) ds)e” YT | F;) equals f(t, y). 
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Table 10.4. Parameters used in the model of Duffie and Gârleanu (2001). Recall that 7? gives 
the intensity of jump in the factor process, u gives the average jump size. With these parameters 
the average waiting time for a jump in the systematic factor process is 1/19 = 5 years. 


K 6 o 19 H 


0.6 0.02 0.14 02 0.1 


Example 10.25 (the model of Duffie and Gârleanu (2001)). The following jump- 
diffusion model has been used in the literature on CDO pricing. The dynamics of 
(W,) are given by 


dW, = k (8 —W,) dt + oy W, dw, + dZ, (10.75) 


for parameters «, 6,0 > 0 and a jump process (Z,) with constant jump intensity 
1° > 0 and exponentially distributed jump sizes with parameter 1/u. Following 
Duffie and Garleanu, we will sometimes call the model (10.75) a basic affine jump 
diffusion. Note that these assumptions imply that the mean of v is equal to u and that v 
has support [0, oo), so that (WY) has only upwards jumps. The existence of a solution 
to (10.75) therefore follows from the existence of solutions in the pure diffusion case. 
It is relatively easy to show that for t > oo we obtain E(W,) > 6 + IP u/x. For 
illustrative purposes we present the parameter values used in Duffie and Gârleanu 
(2001) in Table 10.4; a typical trajectory of (Y%) is simulated in Figure 10.8. Next 
we compute the Laplace—Stieltjes transform ). We obtain for u > —1/, that 


ee / 1 
vu) = e 1 e Fdx = 
(u) / ( /L) l 


foru < —1/ we get (u) = oo. We therefore have all the necessary ingredients to 
set up the Ricatti equations (10.74) and (10.73). In the case of the model (10.75) it 
is in fact possible to solve these equations explicitly (see, for example, Chapter 11 
of Duffie and Singleton (2003)). However, the explicit solution is given by a very 
lengthy expression so we omit the details. 


Application to payment-at-default claims. According to Theorem 10.19, ina model 
with a doubly stochastic default time t with risk-neutral hazard rate y (Y%), the price 
at t of a payment-at-default claim of constant size 5 > 0 equals 


T sS 
s( f y (WY) exp (- / Radu) ds 
t t 


where again R(Y) = r(Y) + y (Y). Using the Fubini Theorem this equals 


T S 
af E(x (-/ RW) du) | wi) ds. (10.77) 
t t 


Suppose now that y (y) = y? + y!w, that R(W) = o? + p!y and that (W) 
is given by an affine jump diffusion as introduced above. In that case the inner 
expectation in (10.77) is given by a function F (t, s, W). This function can be com- 
puted using an extension of the basic affine methodology, so that (10.77) can be 


Fi), (10.76) 
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Figure 10.8. (a) A typical trajectory of the basic affine jump diffusion model (10.75) and 
(b) the corresponding jumps of (Z;). The parameter values used are given in Table 10.4; the 
initial value Wọ is equal to the long-run mean 6 + (l 0 H)/k, which is marked by the horizontal 
line. 


computed by one-dimensional numerical integration. Define for 0 < t < s the 
function f(t, s, Y) = expla (t, s) + B(t, s)Y), where a(-, s) and B(-, s) solve the 
ODEs (10.74), (10.73) with terminal condition a(s, s) = (s,s) = 0. Denote by 
D(x) the derivative of the Laplace-Stieltjes transform of v. It is then a straight- 
forward application of standard calculus to show that, modulo some integrability 
conditions, F(t, s, Y) = fit, Ss, W)(A(t, s) + B(t, s)W), where A(-, s) and B(-, s) 
solve the following ODE system: 


B' (t, s) + k' Bit, s) + h'BB(t, s) —1'b/(—B) Bit, s) = 0, (10.78) 
A'(t,s) + k°B(t, s) + h°BB(t, s) — 195’ (—B) Bit, s) = 0, (10.79) 


with terminal conditions A(s, s) = yo, B(s, s) = yı. Again, (10.78) and (10.79) are 
straightforward to evaluate numerically. 

It is of course possible to compute the conditional expectation (10.76) by using 
the Feynman—Kac formula (10.65) with g = 0 and h = y. However, in most cases 
the ensuing PDE needs to be solved numerically. 


Example 10.26 (defaultable zero-coupon bonds and CDSs). We now have all 
the necessary ingredients to compute prices and credit spreads of defaultable zero- 
coupon bonds and CDS spreads in a model with a doubly stochastic default time 
with hazard rate y, = W, for a one-dimensional affine jump diffusion (Y%). In Fig- 
ure 10.9 we plot the credit spread for defaultable bonds for the recovery assumptions 
discussed in Section 10.4.3. Note that, for T — t, i.e. for time to maturity close to 
zero, the spread tends to c(t, t) = ôW, > 0, as claimed in Section 10.5.3; in particu- 
lar, the credit spread does not vanish as T — t. This is in stark contrast to firm-value 
models, where typically c(t, t) = 0, as was shown in Section 10.3.1. Note further 
that, for T — t large, under the RF assumption we obtain negative credit spreads, 
which is clearly unrealistic. These negative credit spreads are caused by the fact that 
under RF we obtain a payment of fixed size 1 — ô immediately at default. If the 
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Figure 10.9. Spreads of defaultable zero-coupon bonds in the Duffie-Garleanu model 
(10.75) for various recovery assumptions. The parameters of (WY) are given in Table 10.4; 
the initial value is Wọ ~ 0.0533. The risk-free interest rate and the loss given default are 
deterministic and are given by r = 6% and ô = 0.5. Note that under the RF recovery model, 
the spread becomes negative for large times to maturity. 
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Figure 10.10. Fair CDS spreads in the Duffie-Garleanu model (10.75) for a CDS contract 
with semiannual premium payments and varying time to maturity. The parameters of (W+) 
are given in Table 10.4; the initial value is Wọ œ% 0.0533. The risk-free interest rate and the 
loss given default are deterministic and are given by r = 6% and 6 = 0.5. Note that, for small 
time to maturity, the fair swap spread is approximately equal to Wo © 2.7%. 


default-free interest rate r is relatively large, it may happen that 


T T 
£2(exp (-f rs as) — Diecn) > £2(exp (-f Fs as) ier), 
0 0 


even if dé > 0. This stems from the fact that on the right-hand side discounting is done 
over the whole period [0, T] (as opposed to [0, t]), so that discounting has a large 
impact on the value of the right-hand side, compensating the higher terminal pay-off. 
In Figure 10.10 we have plotted the fair spreads for CDSs with and without accrued 
payments for varying maturities, assuming that the risk-neutral hazard process is a 
basic affine jump diffusion. 
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Notes and Comments 


The Feynman—Kac formula is discussed, for example, in Section 5.5 of Bjork (2004) 
and, at a slightly more technical level, in Section 5.7 of Karatzas and Shreve (1988). 

Important original papers on affine models in term structure modelling are Duffie 
and Kan (1996) for diffusion models and Duffie, Pan and Singleton (2000) for jump 
diffusions. The latter paper also contains other applications of affine models, such 
as the pricing of equity options under stochastic volatility and econometric issues 
related to affine models. It should be mentioned that there is also a converse to 
Proposition 10.23; if the conditional expectations E (exp(— ibe R(W,) ds)e” YT | Fi) 
are all exponentially affine functions of Y%, the process (WY) is necessarily affine (see 
Duffie and Kan (1996), Duffie, Filipović and Schachermayer (2003) or Chapter 10 
of Filipović (2009) for details). 

The mathematical properties of the CIR model are discussed in, for example, 
Chapter 6.2 of Lamberton and Lapeyre (1996), where the explicit solution of the 
Ricatti equations in the CIR model (summarized by (10.69) and (10.70)) is also 
derived. The model studied in Example 10.24 is akin to models proposed by Duffie 
and Singleton (1999). Problems related to the modelling of negative correlation 
between state variable processes in an affine setting are discussed in Section 5.8 of 
Lando (2004). Empirical work on affine models for defaultable bonds includes the 
publications of Duffee (1999) and Driessen (2005). 


11 


Portfolio Credit Risk Management 


This chapter is concerned with one-period models for credit portfolios with a view 
towards credit risk management issues for portfolios of largely non-traded credit 
products, such as the retail and commercial loans in the banking book of a typical 
bank. 

The main theme in our analysis is the modelling of the dependence structure of the 
default events. In fact, default dependence has a profound impact on the upper tail 
of the credit loss distribution for a large portfolio. This is illustrated in Figure 11.1, 
where we compare the loss distribution for a portfolio of 1000 firms that default 
independently (portfolio 1) with a more realistic portfolio of the same size where 
defaults are dependent (portfolio 2). In portfolio 2 defaults are weakly dependent, 
in the sense that the correlation between default events is approximately 0.5%. In 
both cases the default probability is approximately 1%, so, on average, we expect 
ten defaults. As will be seen in Section 11.5, the model applied to portfolio 2 can be 
viewed as a realistic model for the loss distribution of a homogeneous portfolio of 
1000 loans with a Standard & Poor’s rating of BB. We see clearly from Figure 11.1 
that the loss distribution of portfolio 2 is skewed and its right tail is substantially 
heavier than the right tail of the loss distribution of portfolio 1, illustrating the 
dramatic impact of default dependence. 

Note that there are good economic reasons for expecting dependence between 
defaults of different obligors. Most importantly, the financial health of a firm varies 
with randomly fluctuating macroeconomic factors such as changes in economic 
growth. Since different firms are affected by common macroeconomic factors, this 
creates dependence between their defaults. 

We begin our analysis with a discussion of threshold models in Section 11.1. 
These can be viewed as multivariate extensions of the Merton model considered 
in Chapter 10. In Section 11.2 we consider so-called mixture models in which 
defaults are assumed to be conditionally independent events given a set of common 
factors. The factors are usually interpreted as macroeconomic variables and are also 
modelled stochastically. Mixture models are commonly used in practice, essentially 
for tractability reasons; many threshold models also have convenient representations 
as mixture models. 

Sections 11.3 and 11.4 are concerned with the calculation or approximation of 
the portfolio loss distribution and related measures of tail risk. We give asymp- 
totic approximations for tail probabilities and quantiles that hold in large, relatively 
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Figure 11.1. Comparison of loss distributions for two homogeneous portfolios of 1000 loans 
with a default probability of 1% and different dependence structures. In portfolio 1 defaults 
are assumed to be independent; in portfolio 2 we assume a default correlation of 0.5%. 
Portfolio 2 can be considered as representative for BB-rated loans. We clearly see that the 
default dependence generates a loss distribution with a heavier right tail and a shift of the 
mode to the left. 


homogeneous portfolios (Section 11.3) and we discuss Monte Carlo methods for 
estimating tail probabilities, risk measures and capital allocations in mixture models 
(Section 11.4). Finally, in Section 11.5 we consider the important issue of statistical 
inference for credit models. 


11.1 Threshold Models 


The models of this section are one-period models for portfolio credit risk inspired by 
the firm-value models of Section 10.3. Their defining attribute is the idea that default 
occurs for a company i over the period [0, T] if some critical rv X; lies below some 
deterministic threshold d;. The variable X; can take on different interpretations. In 
a multivariate version of Merton’s model, X; = X7,; is a lognormally distributed 
asset value at the time horizon T and d; represents liabilities to be repaid at T. More 
abstractly, X; is frequently viewed as a latent variable representing the credit quality 
or “ability-to-pay” of obligor i. 

The dependence among defaults stems from the dependence among the compo- 
nents of the vector X = (Xj,..., Xm)’. The distributions assumed for X can, in 
principle, be completely general, and indeed a major issue of this section will be the 
influence of the copula of X on the risk of the portfolio. 


11.1.1 Notation for One-Period Portfolio Models 


It is convenient to introduce some notation for one-period portfolio models that 
will be in force throughout the remainder of the chapter. We consider a portfolio 
of m obligors and fix a time horizon T. For 1 < i < m, we let the rv R; be a 
state indicator for obligor i at time T and assume that R; takes integer values in 
the set {0, 1, ...,} representing, for example, rating classes; as in Section 10.2.1 
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we interpret the value 0 as default and non-zero values as states of increasing credit 
quality. At time t = 0 obligors are assumed to be in some non-default state. 

Mostly we will concentrate on the binary outcomes of default and non-default 
and ignore the finer categorization of non-defaulted companies. In this case we 
write Y; for the default indicator variables so that Y = 1 <=> Ri = 0 and 
Y; =0 <> R; > 0. The random vector Y = (Y1, ..., Ym) is a vector of default 
indicators for the portfolio, and p(y) = P(Y = y1, ..., Ym = Ym), y € {0, 1}”, 
is its joint probability function; the marginal default probabilities are denoted by 
Pi = P(Y; = 1), i = l,...,m. 

The default or event correlations will be of particular interest to us; they are 
defined to be the correlations of the default indicators. Because 


var(Y;) = EYP) — p? = EW) — p? = pi — pẹ, 
we obtain, for firms i and j with į # j, the formula 
E(YiY;) — PiPj 
{i — p?)(Pj — pî) 


We count the number of defaulted obligors at time T with the rv M := )~/"_, Y;. The 
actual loss if company i defaults—termed the loss given default (LGD) in practice— 


pi, Y;) = (11.1) 


is modelled by the random quantity ôje;, where e; represents the overall exposure 
to company i and O < ô; < 1 represents a random proportion of the exposure that 
is lost in the event of default. We will denote the overall loss by L := paar 6;e; Y; 
and make further assumptions about the e; and ô; variables as and when we need 
them. 

It is possible to set up different credit risk models leading to the same multivariate 
distribution for R or Y. Since this distribution is the main object of interest in the 
analysis of portfolio credit risk, we call two models with state vectors R and R (or 
Y and Y) equivalent if R È R (or Y £ ¥). 


The exchangeable special case. To simplify the analysis we will often assume that 
the state indicator vector R, and thus the default indicator vector Y, are exchange- 
able random vectors. This is one way to mathematically formalize the notion of 
homogeneous groups that is used in practice. Recall that a random vector R is 


said to be exchangeable if (Ri, ..., Rm) a (Rra), ---, RIm)) for any permuta- 
tion U7(1),..., Z(m)) of (1, ..., m). Exchangeability implies in particular that, 
for any k € {1,..., m— 1}, all of the (5) possible k-dimensional marginal distribu- 


tions of R are identical. In this situation we introduce a simple notation for default 
probabilities where m := P(Y; = 1), i € {1,...,m}, is the default probability of 
any firm and 


nk := P(Y = 1,..., Yp = 1), {i1,... ik} C{1,..., Mm}, 2<k <m, 
(11.2) 
is the joint default probability for any k firms. In other words, zz is the probability 
that an arbitrarily selected subgroup of k companies defaults in [0, T]. When default 
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indicators are exchangeable, we get 


EY = EY) = PY; =1)=r, Vi, 
E(Y:Y;) = PY: = 1, Y; =1)= m, Vi Æj, 


so that cov(Y;, Yj) = m2 — mc”; this implies that the default correlation in (11.1) is 
given by 
2 
W.-H 
py := p (Yi, Yj) = errr) 


Za Mee Éj, (11.3) 


which is a simple function of the first- and second-order default probabilities. 


11.1.2 Threshold Models and Copulas 


We start with a general definition of a threshold model before discussing the link to 
copulas. 


Definition 11.1. Let X = (X1, ..., Xm) be an m-dimensional random vector and 
let D € R”*” be a deterministic matrix with elements d;; such that, for every i, the 
elements of the ith row form a set of increasing thresholds satisfying dj; < --- < din. 
Augment these thresholds by setting dig = —oo and dj(n41) = © for all obligors 
and then set 


R= j 4> dij < Xi S dig+)> J €{0,...,n}, i € {1,..., m}. 


Then (X, D) is said to define a threshold model for the state vector R = 
(Ri,..., Rm)’. 


We refer to X as the vector of critical variables and denote its marginal dfs by 
Fy, (x) = P(X; < x). The ith row of D contains the critical thresholds for firm i. 
By definition, default (corresponding to the event R; = 0) occurs if X; < di1, so 
the default probability of company i is given by pj = Fy, (di1). When working with 
a default-only model we simply write d; = dı and denote the threshold model by 
(X,d). 

In the context of such models it is important to distinguish the default correlation 
p(Yi, Y;) of two firms i Æ j from the correlation of the critical variables X; and X ;. 
Since the critical variables are often interpreted in terms of asset values, the latter 
correlation is often referred to as asset correlation. For given default probabilities, 
p(Yi, Y;) is determined by E(Y;Y;) according to (11.1). Moreover, in a threshold 
model, E(Y;Y;) = P(X; < di1, Xj; < dj1), which implies that default correlation 
depends on the joint distribution of X; and X;. If X is multivariate normal, as in 
many models used in practice, the correlation of X; and X ; determines the copula 
of their joint distribution and hence the default correlation (see Lemma 11.2 below). 
For general critical variables outside the multivariate normal class, the correlation 
of the critical variables does not fully determine the default correlation; this can 
have serious implications for the tail of the distribution of M = )~"_, Y;, as will be 
shown in Section 11.1.5. 

We now give a simple criterion for the equivalence of two threshold models in 
terms of the marginal distributions of the state vector R and the copula of X. This 
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result clarifies the central role of copulas in threshold models. For the necessary 
background information on copulas we refer to Chapter 7. 


Lemma 11.2. Let (X, D) and (X, D) be a pair of threshold models with state 
vectors R = (R,,..., Rm)’ and R = (R,,..., Rm)’, respectively. The models are 
equivalent if the following conditions hold. 


(i) The marginal distributions of the random vectors R and R coincide, i.e. 
P(Ri=j)=P(Ri= j) je{l,... n} ie {l,...,m}. 
(ii) X and X admit the same copula C. 


Proof. According to Definition 11.1, R £ R if and only if, for all j1,..., jm € 
{1,...,n}, 


P(dij < Xi S diit) -+ -> dmjm < Xm S dm(jm+1)) 
= P(dij, < Xi S diit- -s dmj < Xm < dng tt): 
By standard measure-theoretic arguments this holds if, for all ji,..., jm € 
{1,...,n}, 
PO S dijo -+5 Xm S dmjn) = Pr < dijo a < dnja). 


By Sklar’s Theorem (Theorem 7.3) this is equivalent to 
C(Fx, (Gig), «+++ FX n (dmjn)) = C (Fg, (dij), +--+ Fg, Ginin))s 


where C is the copula of X and x (using condition (ii)). Condition (i) implies 
that Fy, (dij) = Fy (dij) for all j e {1,...,m}, i © {1,...,m}, and the claim 
follows. 


The result shows that in a threshold model the copula of the critical variables deter- 
mines the link between marginal probabilities of migration for individual firms and 
joint probabilities of migration for groups of firms. To illustrate this further, consider 
for simplicity a two-state model for default and non-default and a subgroup of k com- 
panies {i,,..., ig} C {1,..., m} with individual default probabilities p;,,..., Pip- 
Then 


Pi =1,..., Yi, = 1) = P(Xi, < di, ..., Xi, < igl) 
= Ciy..-i, (Dips ++» Pip), (11.4) 


where C;j,...;, denotes the corresponding k-dimensional margin of C. As a special 
case consider now a model for a single homogeneous group. We assume that X has an 
exchangeable copula (i.e. a copula of the form (7.20)) and that all individual default 
probabilities are equal to some constant x so that the default indicator vector Y is 
exchangeable. The formula (11.4) reduces to the useful formula 


We =Cy..(a,...,0), 2Kk<m, (11.5) 


which will be used for the calibration of some copula models later on. 
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11.1.3 Gaussian Threshold Models 
In this section we discuss the case where the critical variables have a Gauss copula. 


Multivariate Merton model. Itis straightforward to generalize the Merton model 
of Section 10.3.1 to a portfolio of m firms. We assume that the multivariate asset- 


value process (V,) with V; = (V;.1,---, Vi.m)’ follows an m-dimensional geometric 
Brownian motion with drift vector wy = (M1, ..., dm)’, vector of volatilities oy = 
(o1,..., Om)’, and instantaneous correlation matrix P. This means that (V;) solves 


the stochastic differential equations 
dV,; = wiVvii dt +oiViidWii, i=l,...,m, 


for correlated Brownian motions with correlation o(W;,i, W:,;) = pij, t 2 0. For 
all i the asset value Vr; is thus of the form 


Vr i = Vo, exp((ui — 50; )T + 0;Wr,i), 


and Wr := (Wr1,..., Wr.m)’ is a multivariate normal random vector satisfying 
Wr ~ N»(0, TP). In its basic form the Merton model is a default-only model 
in which the firm defaults if Vr; < B; and B; is the liability of firm i. Writing 
B = (B,,..., Bm)’, the threshold model representation is thus given by (Vr, B). 
Since in a threshold model the default event is invariant under strictly increasing 
transformations of critical variables and thresholds, this model is equivalent to a 
threshold model (X, d) with 


In Vr; — In Vo — (ui — 50P)T 
i = oT , 
Hic In B; — In Voi — (ui — 20) )T 
oiT 
The transformed variables satisfy X ~ Nm(0, P) and their copula is the Gauss 
copula Ce 


Gaussian threshold models in practice. In practice it is usual to start directly with 
threshold models of the form (X, d) with X ~ N,,(0, P). There are two practical 
challenges: first, one has to calibrate the threshold vector d (or, in the case of a 
multi-state model, the threshold matrix D) in line with exogenously given default 
and transition probabilities; second, one needs to calibrate the correlation matrix 
P in a parsimonious way. The problem of calibrating the obligor-specific rows of 
the threshold matrix D to (rating) state transition probabilities was discussed in 
Section 10.3.4. In particular, in a default-only model we set d; = #7! (p;) for given 
default probabilities p;, fori = 1,..., m. Since X has standard normal margins, P 
is also the covariance matrix of X. 

In its most general form P has m(m — 1) /2 distinct parameters. In portfolio credit 
risk applications m is typically large and it is important to use a more parsimonious 
parametrization of this matrix based on a factor model of the kind described in 
Section 6.4.1. Factor models also lend themselves to economic interpretation, and the 
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factors are commonly interpreted as country and industry effects. We now describe 
the mathematical form of typical factor models used in credit risk. The calibration 
of these models in practice is discussed in Section 11.5.1. 


Factor models. We assume that 


Xi = JBiFi + y1 = Bisi, (11.6) 


where F; and £1, ..., Em are independent standard normal variables, and where 
0 < ŝi < 1 for all i. In this formulation the F; are the systematic variables, which 
are correlated, and the ¢; are idiosyncratic variables. It follows that 6; can be viewed 
as a measure of the systematic risk of X;: that is, the part of the variance of X; that 
is explained by the systematic variable. 

The systematic variables are assumed to be of the form F; = a’ F, where F isa 
vector of common factors satisfying F ~ Np(0, 2) with p < m, and where £2 is 
a correlation matrix. These factors typically represent country and industry effects. 
The assumption that var(F;) = | imposes the constraint that a, Qa; = | for all 
i. Since var(X;) = 1 and since F; and £1, ..., Em are independent and standard 
normal, the asset correlations in this model are given by 


p(Xi, Xj) = cov(X;, X;) = BiB; cov(F;, Fj) = Bibja Raj. 


In order to set up the model we have to determine a; and 6; for each obligor as 
well as 2, with the additional constraint that a’ Na; = | for all i. Since 2 has 
p(p — 1)/2 parameters, the loading vectors a; and coefficients 6; have a combined 
total of mp + m parameters, and we are applying m constraints, the dimension of 
the calibration problem is mp + p(p — 1)/2. In particular, the number of parameters 
grows linearly rather than quadratically in m. 

Note that this factor model does fit into the general framework developed in 
Section 6.4.1. X can be written as 


X = BF +é, (11.7) 


where B = DA, D = diag(./B),..., /B»), A € R”? is the matrix with ith row 
given by a}, and é; = yI — ßisi. 

We often consider the special case of a one-factor model. This corresponds to 
a model where F; = F for a single common standard normal factor so that the 
equation in (11.6) takes the form 


Xi = J/BiF + y1 — fisi. (11.8) 


If, moreover, every obligor has the same systematic variance 6; = p, we get that 
p(Xi, Xj) = p for alli # j. This model is often referred to as an equicorrelation 
model and was introduced previously in Example 6.32 and equation (6.53). 


11.1.4 Models Based on Alternative Copulas 


While most threshold models used in industry are based explicitly or implicitly on 
the Gauss copula, there is no reason why we have to assume a Gauss copula. In fact, 
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simulations presented in Section 11.1.5 show that the choice of copula may be very 
critical to the tail of the distribution of the number of defaults M. We now look at 
threshold models based on alternative copulas. 

The first class of models attempts to preserve some of the flexibility of the Gaus- 
sian threshold models, which do have the appealing feature that they can accommo- 
date a wide range of different correlation structures for the critical variables. This is 
clearly an advantage in modelling a portfolio where obligors are exposed to several 
risk factors and where the exposure to different risk factors differs markedly across 
obligors, such as a portfolio of loans to companies from different industry sectors 
or countries. 


Example 11.3 (normal mean-variance mixtures). For the distribution of the crit- 
ical variables we consider the kind of model described in Section 6.2.2. We start with 
an m-dimensional multivariate normal vector Z ~ N,,(0, X) and a positive, scalar 
rv W, which is independent of Z. The vector of critical variables X is assumed to 
have the structure 

X =m(W)+JWZ, (11.9) 


where m: [0, 00) —> R” is a measurable function. In the special case where m(W) 
takes a constant value w not depending on W, the distribution is called a normal 
variance mixture. An important example of a normal variance mixture is the multi- 
variate ¢ distribution, as discussed in Example 6.7, which is obtained when W has 
an inverse gamma distribution, W ~ Ig(5v, $v), or equivalently when v/ W ~ x2. 
An example of a general mean-variance mixture is the GH distribution discussed in 
Section 6.2.3. 

In a normal mean-variance mixture model the default condition may be written 
in the form 
dij  mi(W) | 


Jw VW 

where m;(W) is the ith component of m(W). A possible economic interpretation 
of the model (11.9) is to consider Z; as the asset value of company i and dj; as an 
a priori estimate of the corresponding default threshold. The actual default threshold 
is stochastic and is represented by D;, which is obtained by applying a multiplicative 
shock and an additive shock to the estimate d;. If we interpret this shock as a stylized 
representation of global factors such as the overall liquidity and risk appetite in the 
banking system, it makes sense to assume that the shocks to the default thresholds 
of different obligors are driven by the same rv W. 

Normal variance mixtures, such as the multivariate t, provide the most tractable 
examples of normal mean—variance mixtures; they admit a calibration approach 
using linear factor models that is similar to the approach used for models based on 
the Gauss copula. In normal variance mixture models the correlation matrices of X 
(when defined) and Z coincide. Moreover, if Z follows a linear factor model (11.7), 
then X inherits the linear factor structure from Z. Note, however, that the systematic 
factors VW F and the idiosyncratic factors We are no longer independent but 
merely uncorrelated. 


Xi <S di 4> Zi < : Di, (11.10) 


11.1. Threshold Models 433 


The class of threshold models based on the t copula can be thought of as con- 
taining the Gaussian threshold models as limiting cases when v —> oo. However, 
the additional parameter v adds a great deal of flexibility. We will come back to this 
point in Section 11.1.5. 


Another class of parametric copulas that could be used in threshold models is the 
Archimedean family of Section 7.4. 


Example 11.4 (Archimedean copulas). Recall that an Archimedean copula is the 
distribution function of a uniform random vector of the form 


Cut, .--, Um) = "(uy + Fe), (11.11) 


where y: [0,00) — [0, 1] is a continuous, decreasing function, known as the 
copula generator, satisfying y (0) = 1 and lim; W(t) = 0, and yw! is its inverse. 
We assume that y is completely monotonic (see equation (7.47) and surrounding 
discussion). As explained in Section 7.4, these conditions ensure that (11.11) defines 
a copula for any portfolio size m. Our main example in this chapter is the Clayton 
copula. Recall from Section 7.4 that this copula has generator wo (t) = (1 + onl? 
where 0 > 0, leading to the expression 


CF (ur, ..., Um) = U? H tu m. (11.12) 


As discussed in Section 7.4, exchangeable Archimedean copulas suffer from the 
deficiency that they are not rich in parameters and can model only exchangeable 
dependence and not a fully flexible dependence structure for the critical variables. 
Nonetheless, they yield useful parsimonious models for relatively small homoge- 
neous portfolios, which are easy to calibrate and simulate, as we discuss in more 
detail in Section 11.2.4. 

Suppose that X is a random vector with an Archimedean copula and marginal 
distributions Fy,, 1 <i < m, so that (X, d) specifies a threshold model with indi- 
vidual default probabilities Fy, (d;). As a particular example consider the Clayton 
copula and assume a homogeneous situation where all individual default probabil- 
ities are identical to 2. Using equations (11.5) and (11.12) we can calculate that 
the probability that an arbitrarily selected group of k obligors from a portfolio of m 
such obligors defaults over the time horizon is given by zg = (kn? — k + 1)7!/9, 
Essentially, the dependent default mechanism of the homogeneous group is now 
determined by this equation and the parameters x and 0. We study this Clayton 
copula model further in Example 11.13. 


11.1.5 Model Risk Issues 


Model risk is the risk associated with working with misspecified models—in our 
case, models that are a poor representation of the true mechanism governing defaults 
and migrations in a credit portfolio. For example, if we intend to use our models to 
estimate measures of tail risk, like VaR and expected shortfall, then we should be 
particularly concerned with the possibility that they might underestimate the tail of 
the portfolio loss distribution. 
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Table 11.1. Results of simulation study. We tabulate the estimated 95th and 99th percentiles 
of the distribution of M in an exchangeable model with 10 000 firms. The values for the default 
probability x and the asset correlation p corresponding to the three groups A, B and C are 
given in the text. 


40.95(M) qo.99(M) 
pa N, oe 


Group v=oo v=50 v=10 v=oo v=50 v=10 


A 14 23 24 21 49 118 
B 109 153 239 157 261 589 
C 1618 1723 2085 2206 2400 3067 


As we have seen, a threshold model essentially consists of a collection of default 
(and migration) probabilities for individual firms and a copula that describes the 
dependence of certain critical variables. In discussing model risk in this context we 
will concentrate on models for default only and assume that individual default prob- 
abilities have been satisfactorily determined. It is much more difficult to determine 
the copula describing default dependence and we will look at model risk associ- 
ated with the misspecification of this component of the threshold model. See also 
Section 8.4.4 for a discussion of the issue of dependence uncertainty. 


The impact of the choice of copula. Since most threshold models used in practice 
use the Gauss copula, we are particularly interested in the sensitivity of the dis- 
tribution of the number of defaults M with respect to the assumption of Gaussian 
dependence. Our interest is motivated by the observation made in Section 7.3.1 that, 
by assuming a Gaussian dependence structure, we may underestimate the probabil- 
ity of joint large movements of risk factors, with potentially drastic implications for 
the performance of risk-management models. 

We compare a simple exchangeable model with multivariate normal critical vari- 
ables and a model where the critical variables are multivariate t. Given a standard 
normal rv F, an iid sequence €1,..., €m Of standard normal variates independent 
of F, and an asset correlation parameter p € [0, 1], we define a random vector Z 
by Zi = JPF + V1 = psi. Observe that this is the equicorrelation special case of 
the factor model (11.8). 

In the ¢ copula case we define the critical variables X; := /WZ;, where 
W ~ Ig( $v, 4v) is independent of Z, so that X has a multivariate t distribution. 
In the Gauss copula case we simply set X := Z. In both cases we choose thresholds 
so that P(Y; = 1) = x for alli and for some z € (0, 1). Note that the correlation 
matrix P of X (the asset correlation matrix) is identical in both models and is given 
by an equicorrelation matrix with off-diagonal element p. However, the copula of 
X differs, and we expect more joint defaults in the £ model due to the higher level 
of dependence in the joint tail of the t copula. 

We consider three portfolios of decreasing credit quality, labelled A, B and C. 
In group A we set m = 0.06% and p = 2.58%; in group B we set m = 0.50% 
and p = 3.80%; in group C we set m = 7.50% and p = 9.21%. We consider 
a portfolio of size m = 10000. For each group we vary the degrees-of-freedom 
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Table 11.2. Results of simulation study. Estimated 95th and 99th percentiles of the 
distribution of M in an exchangeable model for varying values of asset correlation p. 


Quantile p =2.58% p=3.80% p=9.21% 


40.95(M) 98 109 148 
qgo.99(M) 133 157 250 


parameter v. In order to represent the tail of the number of defaults M, we use 
simulations to determine (approximately) the 95% and 99% quantiles, q0.95 (M) and 
qo.99(M), and tabulate them in Table 11.1. The actual simulation was performed 
using a representation of threshold models as Bernoulli mixture models that is 
discussed later in Section 11.2.4. 

Table 11.1 shows that v clearly has a massive influence on the high quantiles. 
For the important 99% quantile the impact is most pronounced for group A, where 
qo.99(M) is increased by a factor of almost six when we go from a Gaussian model 
to a model with v = 10. 


The impact of changing asset correlation. Here we retain the assumption that 
X has a Gauss copula and study the impact of the factor structure of the asset 
returns on joint default events and hence on the tail of M. More specifically, we 
increase the systematic risk component of the critical variables for the obligors in 
our portfolio and analyse how this affects the tail of M. We use the exchangeable 
model introduced above. We fix the default probability at x = 0.50% (the value 
for group B above) and vary the asset correlation p using the values p = 2.58%, 
p = 3.80% and p = 9.21%. In Table 11.2 we tabulate go.95(M) and go.99(M) for 
a portfolio with 10000 counterparties. Clearly, varying p also has a sizeable effect 
on the quantiles of M. However, this effect is less dramatic and, in particular, less 
surprising than the impact of varying the copula in our previous experiment. 

Both simulation experiments suggest that the loss distributions implied by thresh- 
old models are very sensitive to the copula of the critical variables. For this reason 
a substantial effort should be devoted to the calibration of the dependence model 
for the critical variables. Moreover, it is important to conduct sensitivity analyses to 
understand the implications of model risk for risk capital calculations. 


Notes and Comments 


Our presentation of threshold models is based, to a large extent, on Frey and McNeil 
(2001, 2003). In those papers we referred to the models as “latent variable” mod- 
els, because of structural similarities with statistical models of that name (see Joe 
1997). However, whereas in statistical latent variable models the critical variables 
are treated as unobserved, in credit models they are often formally identified, e.g. as 
asset values or asset-value returns. 

To see how the factor modelling approaches used in industry correspond to our 
presentation in Section 11.1.3 readers should consult the CreditMetrics technical 
document (RiskMetrics Group 1997) and the description of Moody’s GCorr model 
in Huang et al. (2012). The latter model is used to model correlations between 


436 11. Portfolio Credit Risk Management 


changes in credit quality for many different kinds of obligor including publicly 
traded firms, private firms, small and medium-sized enterprises (SMEs) and retail 
borrowers. For public firms, weekly asset returns, calculated as part of the public- 
firm EDF methodology described in Section 10.3.3, are used as the measure of 
changing credit quality or “ability-to-pay”. 

The first systematic study of model risk for credit portfolio models is Gordy 
(2000). Our analysis of the impact of the copula of X on the tail of M follows Frey, 
McNeil and Nyfeler (2001). For an excellent discussion of various aspects of model 
risk in risk management in general, we refer to Gibson (2000). 


11.2 Mixture Models 


In a mixture model the default risk of an obligor is assumed to depend on a set of 
common factors, usually interpreted as macroeconomic variables, which are also 
modelled stochastically. Given a realization of the factors, defaults of individual 
firms are assumed to be independent. Dependence between defaults stems from 
the dependence of individual default probabilities on the set of common factors. 
We start with a general definition of a Bernoulli mixture model in Section 11.2.1 
before looking in detail at the important special case of one-factor Bernoulli mixture 
models in Section 11.2.2. In Section 11.2.4 we show that many threshold models 
can be represented as Bernoulli mixtures, and in Section 11.2.5 we discuss the 
approximation of Bernoulli mixture models through Poisson mixture models and 
the important example of CreditRiskT. 


11.2.1 Bernoulli Mixture Models 


Definition 11.5 (Bernoulli mixture model). Given some p < m and a p-dimen- 
sional random vector W = (W1, ..., Wp)’, the random vector Y = (Y1,..., Ym)’ 
follows a Bernoulli mixture model with factor vector W if there are functions 
pi: RP —> [0,1], 1 < i < m, such that, conditional on W, the components of 
Y are independent Bernoulli rvs satisfying P (Y; = 1 | W = 4) = piQ). 


For y = (y1, ..-, Yn)’ in {0, 1y” we have that 


m 
PY =y|¥=w=] [pia -piy'™, (11.13) 
i=1 
and the unconditional distribution of the default indicator vector Y is obtained by 
integrating over the distribution of the factor vector W. In particular, the default 
probability of company i is given by pj = P(Y; = 1) = E(pi(®)). 

Note that the two-stage hierarchical structure of a Bernoulli mixture model facil- 
itates sampling from the model: first we generate the economic factor realizations, 
then we generate the pattern of defaults conditional on those realizations. The second 
step is easy because of the conditional independence assumption. 

In general, Bernoulli mixture models have a number of computational advantages. 
Consider the portfolio loss L = Mj eiôi Y; in the case where the exposures e; and 
LGDs 6; are deterministic. While it is difficult to compute the df Fz of L, itis easy to 
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use the conditional independence of the defaults to show that the Laplace-—Stieltjes 
transform of Fz for t € R is given by 


Fy) = Ee") = EE (e™} |W) = b(£(exp ( S Dear) | )) 
i=l 
=E (TI EEE | w) 
i=l 


m 
= e( [Joie t#® +1 = pi w), 
i=1 
which can also be obtained by integrating over the distribution of the factors W. 
The Laplace-Stieltjes transform is useful in a number of practical tasks relating to 
Bernoulli mixture models, as follows. 


e To implement an efficient Monte Carlo scheme for sampling losses from 
a Bernoulli mixture model we often use importance sampling. For this we 
need the moment-generating function of L, which can be calculated from the 
Laplace-Stieltjes transform according to Mz (t) = E (eH) = F 1. (—t) (see 
Section 11.4 for more details). 


e The probability mass function of L may be calculated by using the inverse 
Fourier transform to invert the characteristic function of L given by $, (t) = 
E(e!£'). The characteristic function has the same functional form as the 
Laplace-Stieltjes transform Êr (t) but with the imaginary argument —it (see 
also the discussion after Theorem 11.16). 


11.2.2 One-Factor Bernoulli Mixture Models 


One-factor models, i.e. models where W is one dimensional, are particularly impor- 
tant special cases because of their tractability. Their behaviour for large portfolios 
is particularly easy to understand, as will be shown in Section 11.3, and this has 
had an influence on the Basel capital framework. Moreover, they have relatively 
few parameters and are thus easier to estimate from data. Throughout this section 
we consider an rv ¥ with values in R and functions p;(W): R — [0, 1] such that, 
conditional on W, the default indicator Y is a vector of independent Bernoulli rvs 
with P(Y; = 1 | W = y) = pi (wv). We now consider a variety of special cases. 


Exchangeable Bernoulli mixture models. A further simplification occurs if the 
functions p; are all identical. In this case the Bernoulli mixture model is termed 
exchangeable, since the random vector Y is exchangeable. It is convenient to intro- 
duce the rv Q := pı(¥) and to denote the distribution function of this mixing 
variable by G(qg). Conditional on Q = q, the number of defaults M is the sum of 
m independent Bernoulli variables with parameter q and it therefore has a binomial 
distribution with parameters q and m, i.e. P(M =k | Q =q) = (aa — q)". 
The unconditional distribution of M is obtained by integrating over q. We have 


PEEN {™@ i k m—k 
P(M =k) = S q“ -— q) dG (q). (11.14) 
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Using the notation of Section 11.1.1 we can calculate default probabilities and joint 
default probabilities for the exchangeable group. Simple calculations give m = 
E(Y1) = E(E(% | Q)) = E(Q) and, more generally, 


m = P= 1,..., Yp = 1) = E(E O1- Yg | Q)) = E(O*), (11.15) 


so that unconditional default probabilities of first and higher order are seen to be 
moments of the mixing distribution. Moreover, fori # j, cov(Y;, Yj) = m2 — m? = 
var(Q) > 0, which means that in an exchangeable Bernoulli mixture model the 
default correlation py defined in (11.3) is always non-negative. Any value of py 
in [0, 1] can be obtained by an appropriate choice of the mixing distribution G. In 
particular, if py = var(Q) = 0, the rv Q has a degenerate distribution with all 
mass concentrated on the point z and the default indicators are independent. The 
case py = | corresponds to a model where x = m) and the distribution of Q is 


concentrated on the points 0 and 1. 


Example 11.6 (beta, probit-normal and logit-normal mixtures). The following 
mixing distributions are frequently encountered in Bernoulli mixture models. 


Beta mixing distribution. In this model Q ~ Beta(a, b) for some parameters a > 
0 and b > 0. See Section A.2.1 for more details concerning the beta distribution. 


Probit-normal] mixing distribution. Here, Q = @(u+ow) for W ~ N(O, 1), 
u € Rando > 0, where @ is the standard normal distribution function. We show 
later, in Section 11.2.4, that this model is equivalent to an exchangeable version 
of the one-factor Gaussian threshold model in (11.8). 


Logit-normal mixing distribution. Here, Q = F(u+oW) for YW ~ N(O, 1), 
u € Rando > 0, where F(x) = (1 + e~*)~! is the df of a so-called logistic 
distribution. 


In the model with beta mixing distribution, the higher-order default probabilities 
zt, and the distribution of M can be computed explicitly (see Example 11.7 below). 
Calculations for the logit-normal, probit-normal and other models generally require 
numerical evaluation of the integrals in (11.14) and (11.15). If we fix any two of 
I, m2 and py in a beta, logit-normal or probit-normal model, then this fixes the 
parameters a and b or u and o of the mixing distribution, and higher-order joint 
default probabilities are automatically determined. 


Example 11.7 (beta mixing distribution). By definition, the density of a beta 
distribution is given by 


g(q) = ge eg, a,b>0,0<q<1, 


B(a, b) 
where f(a, b) denotes the beta function. Below we use the fact that the beta function 
satisfies the recursion formula B(a + 1,b) = (a/(a + b))B(a, b); this is easily 
established from the representation of the beta function in terms of the gamma 
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function in Section A.2.1. Using (11.15), for the higher-order default probabilities 
we obtain 


1 
Tk = zl qg 1 (a — q)’ dq = ARA k=1,2,.... 
B(a, b) Jo B(a, b) 
The recursion formula for the beta function yields zg = Tio (a+ j)/(at+b+ j); 
in particular, 7 = a/(a + b), m2 =m(at+1)/(a+b+4+1) and py = (a+b+1)7!. 
The rv M has a so-called beta-binomial distribution. From (11.14) we obtain 


m 1 : kta-1 m—k+b—-1 
P(M=k)= q (1—q) dq 
0 


kJ B(a, b) 
je (11.16) 
k f(a, b) 


One-factor models with covariates. It is straightforward to extend the one-factor 
probit-normal and logit-normal mixture models to include covariates that influence 
default probability and default correlation; these covariates might be indicators for 
group membership, such as a rating class or industry sector, or key ratios taken from 
a company’s balance sheet. 

Writing x; € RÝ for a vector of deterministic covariates, a general model for the 
conditional default probabilities p;(W) in (11.13) would be to assume that 


Di(Y) = h(ui +a), 
hi =u + P'xi, (11.17) 
oi = exp(ô + y'xi), 


where W ~ N(0,1), A(x) = (x) or h(x) = (1 + e7™*)!, the vectors B = 
(Bi, -.., Bx)’ and y = (y1, -.., ye)’ contain regression parameters, and u € R and 
ô € R are intercept parameters. Similar specifications are commonly used in the 
class of generalized linear models in statistics (see Section 11.5.3). 

Clearly, if x; = x for all i, so that all risks have the same covariates, then we are 
back in the situation of full exchangeability. Note also that, since the function p; (W) 
is increasing in W, the conditional default probabilities (pı (¥), ..., Dm(W)) form 
a comonotonic random vector; hence, in a state of the world where the default prob- 
ability is comparatively high for one counterparty, it is high for all counterparties. 
For a discussion of comonotonicity we refer to Section 7.2.1. 


Example 11.8 (model for several exchangeable groups). The regression structure 
in (11.17) includes partially exchangeable models, where we define a number of 
groups within which risks are exchangeable. These groups might represent rating 
classes according to some internal or rating-agency classification. 

If the covariates x; are simply k-dimensional unit vectors of the form x; = 


€r(i), Where r(i) € {1,..., k} indicates, say, the rating class of firm i, then the 
model (11.17) can be written in the form 
Di) = hluo + ora) (11.18) 


for parameters u, := u + br and o, := ety forr=1,...,k. 
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Inserting this specification into (11.13) allows us to find the conditional distribu- 
tion of the default indicator vector. Suppose there are m, obligors in rating category 
r forr =1,...,k, and write M, for the number of defaults. The conditional distri- 
bution of the vector M = (M,,..., Mx)’ is given by 


k 
pam =i =0)=[]( 


Mr 
L 
ral 


Jaw + or9))" A — hlu, + 0y)", 

(11.19) 
where l = (l, ...,l)'. A model of the form (11.19) with o} = --- = øg will be 
fitted to Standard & Poor’s default data in Section 11.5.4. The asymptotic behaviour 
of such a model (when m is large) is investigated in Example 11.20. 


r 


11.2.3 Recovery Risk in Mixture Models 


In standard portfolio risk models it is assumed that the loss given default is inde- 
pendent of the default event. This is likely to be an oversimplification, as economic 
intuition suggests that recovery rates depend on risk factors similar to those for 
default probabilities; in that case one speaks of systematic recovery risk. Consider, 
for instance, the market for mortgages. During a property crisis many mortgages 
default. At the same time property prices are low, so that real estate can be sold 
only for very low prices in a foreclosure (a forced sale in which a bank liquidates a 
property it holds as collateral), so that recovery rates are low. 

The presence of systematic recovery risk is confirmed in a number of empirical 
studies. Among others, Frye (2000) has carried out a formal empirical analysis 
using recovery data collected by Moody’s on rated corporate bonds. He found that 
recovery rates are substantially lower than average in times of economic recession. 
To quote from his paper: 


Using that data [the Moody’s data] to estimate an appropriate credit 
model, we can extrapolate that in a severe economic downturn recover- 
ies might decline 20-25 percentage points from the normal-year aver- 
age. This could cause loss given default to increase by nearly 100% and 
to have a similar effect on economic capital. Such systematic recovery 
risk is absent from first-generation credit risk models. Therefore these 
models may significantly understate the capital required at banking 
institutions. 


In a similar vein, Hamilton et al. (2005) estimated formal models for the relationship 
between one-year default rate g and recovery rate R for corporate bonds; according 
to their analysis the best-fitting relationship is R(q) ~ (0.52 — 6.9q)T. 

Clearly, these findings call for the inclusion of systematic recovery risk in standard 
credit risk models. This is easily accomplished in the mixture-model framework— 
we replace the constant 6; with some function 6;(y)—but the challenge lies in 
the estimation of the function 6;(-) describing the relationship between loss given 
default and the systematic factors. 
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11.2.4 Threshold Models as Mixture Models 


Although the mixture models of this section seem, at first glance, to be different in 
structure from the threshold models of Section 11.1, it is important to realize that the 
majority of useful threshold models, including all the examples we have given, can 
be represented as Bernoulli mixture models. This is a very useful insight, because 
the Bernoulli mixture format has a number of advantages over the threshold format. 


e Bernoulli mixture models lend themselves to Monte Carlo risk studies. From 
the analyses of this section we obtain methods for sampling from many of 
the models we have discussed, such as the ¢ copula threshold model used in 
Section 11.1.5. 


Mixture models are arguably more convenient for statistical fitting purposes. 
We show in Section 11.5.3 that statistical techniques for generalized linear 
mixed models can be used to fit mixture models to empirical default data 
gathered over several time periods. 


The large-portfolio behaviour of Bernoulli mixtures can be analysed and 
understood in terms of the behaviour of the distribution of the common eco- 
nomic factors, as will be shown in Section 11.3. 


To motivate the subsequent analysis we begin by computing the mixture model 
representation of the simple one-factor Gaussian threshold model in (11.8). It is 
convenient to identify the variable W in the mixture representation with minus the 
factor F in the threshold representation; this yields conditional default probabilities 
that are increasing in W and leads to formulas that are in line with the Basel IRB 
formula. With F = —W the one-factor model takes the form 


Xi = —/ Biv + /1— Bisi. 


By definition, company i defaults if and only if X; < d; and hence if and only if 
V1— Pici < di + /BiW. Since the variables ¢1,..., €m and W are independent, 
default events are independent conditional on W and we can compute 


ih) = PY = 1 | = y) = P(/1 — Piei < di + VB | Y = y) 
2% (* + VBi £) 

vI- Bi a) 

where we have used the fact that ¢; is standard normally distributed. The threshold 


is typically set so that the default probability matches an exogenously chosen value 
pi, so that dj = p! (pi). In that case we obtain 


P! (pi) + Lr) 
V1— Bi 
In the following we want to extend this idea to more general threshold models 


with a factor structure for the critical variables. We give a condition that ensures that 
a threshold model can be written as a Bernoulli mixture model. 


(11.20) 


pil) = o( he) 
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Definition 11.9. A random vector X has a p-dimensional conditional independence 
structure with conditioning variable W if there is some p < m and a p-dimensional 
random vector W = (Wj,..., Wp)! such that, conditional on W, the rvs X1,..., Xm 
are independent. 


In the motivating example the conditioning variable was taken to be YW = —F. 
The next lemma generalizes the computations in (11.20) to any threshold model 
with a conditional independence structure. 


Lemma 11.10. Let (X,d) be a threshold model for an m-dimensional random 
vector X. If X has a p-dimensional conditional independence structure with con- 
ditioning variable W, then the default indicators Y; = Ix, <a,; follow a Bernoulli 
mixture model with factor Ý, where the conditional default probabilities are given 
by pil) = P(Xi < di |W = 4). 


Proof. For y € {0, 1}” define the set B := {1 <i < m: y; = 1} and let BS = 
{1,..., m}\B. We have 


PY =y|W=wW=P( (ie <a) N> av=) 
icB ic BS 
=[] Pa <4 |w=w [0 Pa <d |=). 


icB ie BS 


Hence, conditional on W = yw, the Y; are independent Bernoulli variables with 
success probability p;(w) := P(X; < di | Y = 4). 


We now consider a number of examples. 


Example 11.11 (Gaussian threshold model). Consider the general Gaussian 
threshold model with the factor structure in (11.6), which takes the form 


Xi = y Pia; F + y1 — biei, (11.22) 


where £1, ... , Em are iid standard normal and where var (a; F) = 1 for all i. Condi- 
tional on W = — F, the vector X is normally distributed with diagonal covariance 
matrix and thus has conditional independence structure. With d; = #7! (p;) the 


conditional default probabilities are given by 


pi) = PY; =1 | Y = Y) = P(/1= Bisi < di + /BialW) 
_ Ta a) 
JI=Fi 
By comparison with Example 11.6 we see that the individual stochastic default 


probabilities p;(W) have a probit-normal distribution with parameters u; and oj 
given by 


(11.23) 


ui = D7! (pi)/ V1- Bi and of = Bi/(1 — Bi). 
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Example 11.12 (Student ¢ threshold model). Now consider the case where the 
critical variables are of the form X = /WZ , where Z follows the Gaussian factor 
model in (11.22) and W ~ Ig(5v, $v). The vector X has a multivariate ¢ distribution 
with v degrees of freedom and standard univariate tf margins with v degrees of 
freedom. 

This time we condition on W = (— F’, W)’. Given W = (w, w), the vector X has 
a multivariate normal distribution with independent components, and a computation 
similar to that in the previous example gives 


ty (pw i? + vot) 
~l- Êi i 
The formulas (11.23) and (11.24) are useful for Monte Carlo simulation of 
the corresponding threshold models. For example, rather than simulating an m- 
dimensional ¢ distribution to implement the ft model, one only needs to simulate a 
p-dimensional normal vector W with p < m and an independent gamma-distributed 
variate V = W~!. In the second step of the simulation one simply conducts a series 
of independent Bernoulli experiments with default probabilities p;(W) to decide 
whether individual companies default. 


Pil) = pi(¥, w) = o( (11.24) 


Application to Archimedean copula models. Another class of threshold models 
with an equivalent mixture representation is provided by models where the critical 
variables have an exchangeable LT-Archimedean copula in the sense of Defini- 
tion 7.52. Consider a threshold model (X, d), where X has an exchangeable LT- 
Archimedean copula C with generator given by the Laplace transform G of some 
df G on [0, œœ) with G(O) = 0. Let p = (p1,..., Pm)’ denote the vector of default 
probabilities. 

Consider now a non-negative rv W ~ Gandrvs U,,..., Um that are conditionally 
independent given W with conditional distribution function P(U; < u | W = 
W)= exp(—WG7!(u)) foru € [0, 1]. Proposition 7.51 then shows that U has df C. 
Moreover, by Lemma 11.2, (X, d) and (U, p) are two equivalent threshold models 
for default. By construction, U has a one-dimensional conditional independence 
structure with conditioning variable WY, and the conditional default probabilities are 
given by 


pith) = PU; < pi | Y =v) = exp(-—WG7!(p;)). (11.25) 


In order to simulate from a threshold model based on an LT-Archimedean copula 
we may therefore use the following efficient and simple approach. In a first step 
we simulate a realization y of W and then we conduct m independent Bernoulli 
experiments with default probabilities p;(y) as in (11.25) to simulate a realization 
of the defaulting counterparties. 


Example 11.13 (Clayton copula). As an example consider the Clayton copula with 
parameter 0 > 0. Suppose we wish to construct an exchangeable Bernoulli mixture 
model with default probability x and joint default probability 72 that is equivalent 
to a threshold model with the Clayton copula for the critical variables. As mentioned 
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in Algorithm 7.53, a gamma-distributed rv W ~ Ga(1/6, 1) (see Section A.2.4 for 
a definition) has Laplace transform G(t) = (+A. Using (11.25), the mix- 
ing variable of the equivalent Bernoulli mixture model can be defined by setting 
Q = pi(W) = exp -Y (x? — 1)). 

Using (11.4), the required value of 6 to give the desired joint default probabilities 
is the solution to the equation m2 = Cg(z,7) = (2x7? — 1)-'/° 0 > 0. It is 
easily seen that zz and, hence, the default correlation in our exchangeable Bernoulli 
mixture model are increasing in 6; for 0 — 0 we obtain independent defaults and 
for 6 — oo defaults become comonotonic and default correlation tends to one. 


11.2.5 Poisson Mixture Models and CreditRiskt 


Since default is typically a rare event, itis possible to approximate Bernoulli indicator 
rvs for default with Poisson rvs and to approximate Bernoulli mixture models with 
Poisson mixture models. By choosing independent gamma distributions for the 
economic factors W and using the Poisson approximation, we obtain a particularly 
tractable model for portfolio losses, known as CreditRisk*. 


Poisson approximation and Poisson mixture models. To be more precise, assume 
that, given the factors W, the default indicator variables Y;,..., Ym for a particular 
time horizon are conditionally independent Bernoulli variables satisfying P(Y; = 
1 | W = w) = pi (). Moreover, assume that the distribution of W is such that 
the conditional default probabilities p;(y) tend to be very small. In this case the Y; 
variables can be approximated by conditionally independent Poisson variables Ý; 
satisfying Y; | W = y ~ Poi(p;(W)), since 


PY =0|¥ =y) =e" x1 — pip), 
PY =1| Y = y) = pie ™™ © pi). 


Moreover, the portfolio loss L = }°”, e;6;¥; can be approximated by L = 
X; ] eiði Y;. Of course, it is possible for a company to “default more than once” in 
the approximating Poisson model, albeit with a very low probability. 

We now give a formal definition of a Poisson mixture model for counting variables 
that parallels the definition of a Bernoulli mixture model in Section 11.2.1. 


Definition 11.14 (Poisson mixture model). Given some p < m and a p-dimen- 
sional random vector W = (W,..., Wp), the random vector Ý = (1, ..., Yay 
follows a Poisson mixture model with factors W if there are functions A; : RP > 
(0, œ), 1 < i < m, such that, conditional on W = w, the random vector Y isa 
vector of independent Poisson distributed rvs with rate parameter A; (wv). 


If Ý follows a Poisson mixture model and if we define the indicators Y, j= 1 NÆSE 
then Y follows a Bernoulli mixture model and the mixing variables are related by 
pO =1- e0, 


The CreditRisk* model. The CreditRisk* model for credit risk was proposed by 
Credit Suisse Financial Products in 1997 (see Credit Suisse Financial Products 
1997). It has the structure of the Poisson mixture model in Definition 11.14, where 
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the factor vector W consists of p independent gamma-distributed rvs. The distribu- 
tional assumptions and functional forms imposed in CreditRisk* make it possible to 
compute the distribution of the number of defaults and the aggregate portfolio loss 
fairly explicitly using techniques for compound distributions and mixture distribu- 
tions that are well known in actuarial mathematics and which are also discussed in 
Chapter 13 (see Sections 13.2.2 and 13.2.4 in particular). 
The (stochastic) parameter A; (W) of the conditional Poisson distribution for firm i 
is assumed to take the form 
Li(W) = kw, (11.26) 


for a constant k; > 0, for non-negative factor weights w; = (wj1,..., Wip) satisfy- 
ing ay wij = 1, and for p independent Ga(« j, 6; )-distributed factors 4, ..., Wp 
with parameters set to bea; = Bj = aj” foro; > Oand j = 1,..., p. This 
parametrization of the gamma variables ensures that we have E(W;) = 1 and 
var(W;) = oF. 

It is easy to verify that 


E(¥) = E(E Č; | ¥)) = EQ) = kh Ew) = ki, 


so that k; is the expected number of defaults for obligor i over the time period. 
Setting Y; = Lys we also observe that 


P(Y; = 1) = E(P(Y; > 0 | W)) = E(1 — exp(—kj w}W)) © ki E(wiW) = ki, 
for k; small, so that k; is approximately equal to the default probability. 


Remark 11.15. The exchangeable version of CreditRiskt is extremely close to an 
exchangeable Bernoulli mixture model with beta mixing distribution. To see this, 
observe that in the exchangeable case the implied Bernoulli mixture model for Y 
has mixing variable Q given by Q = 1 —e~*” for some W ~ Ga(«, 6), k > O and 
a = f. For q € (0, 1) we therefore obtain 


P(Q<q)=P-e™ < a= P(¥ <=”), 


k 


so that the densities gg and gy are related by gg(q) = gw(—In(1 — g)/k)/(kKU — 
q)). Using the form of the density of the Ga(q, £) distribution we obtain 


ees: —In(l—q)\""! Bin(1—4q) 
800 = Foal k ) ex( k ) 


fee call a-l] _ E/K- 
= ($) Ta Indl — q))"""d — 4) : 


In a realistic credit risk model the parameters are chosen in such a way that the mass 
of the distribution of Q is concentrated on values of q close to zero, since default is 
typically a rare event. For small q we may use the approximation — ln(1 — q) ~ q to 
observe that the functional form of gg is extremely close to that of a beta distribution 
with parameters œ and 6/k, where we again recall that the model is parametrized 
to havea = £. 
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Distribution of the number of defaults. In CreditRisk* we have that, given Y = y, 
Ý; ~ Poi(k;w; ‘W), which implies that the distribution of the number of defaults 
M := $" Y; satisfies 


m 
mY =y ~Poi( Y kuy), (11.27) 
i=1 
since the sum of independent Poisson variables is also a Poisson variable with a rate 
parameter given by the sum of the rate parameters of the independent variables. 

To compute the unconditional distribution of M we require a well-known result 
on mixed Poisson distributions, which appears as Proposition 13.21 in a discussion 
of relevant actuarial methodology for quantitative risk management in Chapter 13. 
This result says that if the rv N is conditionally Poisson with a gamma-distributed 
rate parameter A ~ Ga(a, $), then N has a negative binomial distribution, N ~ 
NB(a, B/(B + 1)). 

In the case when p = 1 we may apply this result directly to (11.27) to deduce that 
M has a negative binomial distribution (since a constant times a gamma variable 
remains gamma distributed). For arbitrary p we now show that M is equal in distri- 
bution to a sum of p independent negative binomial rvs. This follows by observing 


g e) 


Now consider rvs Mi, mA M p such that M; is conditionally Poisson with mean 
Op ı kiwij)w; conditional on W; = y;. The independence of the components 
Yi; ea Pp implies that the M; are independent, and by construction we have 
M= S Mj. Moreover, the rvs QGL ı kiwij) Yj are gamma distributed, so that 
each of the ‘ML, ius a negative binomial distribution by Proposition 13.21. 


Distribution of the aggregate loss. To obtain a tractable model, exposures are 
discretized in CreditRisk* using the concept of exposure bands. The CreditRisk* 
documentation (see Credit Suisse Financial Products 1997) suggests that the LGD 
can be subsumed in the exposure by multiplying the actual exposure by a value for 
the LGD that is typical for an obligor with the same credit rating. We will adopt this 
approach and assume that the losses arising from the individual obligors are of the 
form L; = êj Yi, where the e; are known (LGD-adjusted) exposures. 

For all i, we discretize e; in units of an amount €; we replace e; by a value 
lie > ei, where £; is a positive integer multiplier. We now define exposure bands 
b = 1,...,n corresponding to the distinct values £, ...,£™ for the multipliers. 
In other words, we group obligors in exposure bands according to the values of their 
discretized exposures. 

Let sp denote the set of indices for the obligors in exposure band b; this means 
thaties, = > 4; = 2. Let L® = ae Eli Y; denote the aggregate loss in 
exposure band b. We have L = «¢ M), where M® = = Vics, Ý; denotes the 
number of defaults in exposure band b. Let L = Xp- L LO) denote the aggregate 
portfolio loss. 
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We now want to determine the distribution of L. The following theorem gives the 
necessary information for achieving this with Fourier inversion. 


Theorem 11.16. Let L represent the aggregate loss in the general p-factor 
CreditRisk* model with exposures discretized into exposure bands as described 
above. The following then hold. 


(i) The Laplace-Stieltjes transform of the df of L is given by 


P or? 


m n 
Fis) =T] Gras c -yre"ai5)) * (11.28) 
i=l b=1 


j=l 
where q jb = Xici kiwij/ Xii kiwij forb = 1, TERE? 


(ii) The distribution of L has the structure L 4 aa 1 Zj, Where the Zj are 
independent variables that follow a compound negative binomial distribu- 
tion. More precisely, it holds that Z; ~ CNB(o; *, 0j, Gy,) with 6; = 
d+ o? iai ki wij)! and Gx, the df of a multinomial random variable X ; 
taking the value e£% with probability q jb- 


Proof. The proof requires the mathematics of compound distributions as described 
in Section 13.2.2. 


(i) Conditional on ¥ = y, M® = >> 
eter 


io Y; has a Poisson distribution with param- 


MW) := SU) = > kiwy, 
1ESp i€sp 


and the loss L® = eLO M©) in exposure band b has a compound Poisson distri- 
bution given by 
LO | y = y ~ CPA” (W), GO), 


where GO is the df of point mass at e£®. It follows from Proposition 13.10 on 
sums of compound Poisson variables that 


= : "A Cp) t) 
L |Y = y ~ CPoi | A(w), G”), 11.29 
| v oi( (y) 2 LO) (11.29) 


where A(W) := X}; 4 (y). The (conditional) severity distribution in (11.29) 
is the distribution function of a multinomial random variable that takes the values 
€£) with probabilities A® (w)/A(w) for b = 1,...,n. 

Writing Ê Liy (s | Y) for the Laplace-Stieltjes transform of the conditional dis- 
tribution function of L given W, we can use equation (13.11) and Example 13.5 to 
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infer that 


AON) ser 
Fie |W) =e (- Pe: ‘ )) 


xp (-(& Z wv- yet Oe or) 


i=1 j=l iesp j=l 


eR a r) 


LESp 


p 
S ll exp (- by De wij (1 = 5 A 
j=l 


b=1 


Writing gw, for the density of the factor W; and using the independence of the 
factors, it follows that 


P ie) m n 
w=] f exp (=y D kushi- Zea) Jaws ov, 
j=1 i=1 b=1 


and equation (11.28) is derived by evaluating the integrals and recalling that the 
parameters of the gamma distribution are chosen to be «j = Bj = On 


(ii) To see that this is the Laplace—Stieltjes transform of the df of a sum of independent 
compound negative binomial distributions, observe that (11.28) may be written as 
p m -07° 
f; (s) = [I (+ Erua - Gx, 000) : (11.30) 
j=l i=l 


where Gx j is the Laplace-Stieltjes transform of Gy,. Substituting 6; = (1 + 
OF ie Ki wij)! we obtain 


P 0; o7? 
#0) = [1 ( — Ja 


jai \1 Gx; (8) — 8j) 


and, by comparing with Example 13.6, this can be seen to be the product of Laplace— 
Stieltjes transforms of the dfs of variables Z; ~ CNB(07*, 0j, Gx j) 


This theorem gives the key information required to evaluate the distribution of L 
by applying Fourier inversion to the characteristic function of L. Indeed, we have 
-2 


p fom 
ieee 8j 
#2) = (ghar) i 


j=l 


where px, is the characteristic function of the multinomial severity distribution. It 
is now straightforward to compute px, using the fast Fourier transform and hence 
to compute the cf of L. The probability mass function of L can then be computed 
by inverting #; using the inverse fast Fourier transform. 
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Notes and Comments 


The logit-normal mixture model can be thought of as a one-factor version of the 
CreditPortfolio View model of Wilson (1997a,b). Details of this model can be found 
in Section 5 of Crouhy, Galai and Mark (2000). Further details of the beta-binomial 
distribution can be found in Joe (1997). 

The rating agency Moody’s uses a so-called binomial expansion technique to 
model default dependence in a simplistic way. The method, which is very popular 
with practitioners, is not based on a formal default risk model but is related to 
binomial distributions. The basic idea is to approximate a portfolio of m dependent 
counterparties by a homogeneous portfolio of d < m independent counterparties 
with adjusted exposures and identical default probabilities; the index d is called the 
diversity score and is chosen according to rules defined by Moody’s. For further 
information we refer to Davis and Lo (2001) and to Section 9.2.7 of Lando (2004). 

The equivalence between threshold models and mixture models has been observed 
by Koyluoglu and Hickman (1998) and Gordy (2000) for the special case of Credit- 
Metrics and CreditRiskt. Applications of Proposition 7.51 to credit risk modelling 
are also discussed in Schonbucher (2005). The study of mixture representations 
for sequences of exchangeable Bernoulli rvs is related to a well-known result 
of de Finetti, which states that any infinite sequence Y1, Y2,... of exchangeable 
Bernoulli rvs has a representation as an exchangeable Bernoulli mixture; see, for 
instance, Theorem 35.10 in Billingsley (1995) for a precise statement. Any exchange- 
able model for Y that can be extended to arbitrary portfolio size m therefore has a 
representation as an exchangeable Bernoulli mixture model 

A comprehensive description of CreditRisk* is given in its original documenta- 
tion (Credit Suisse Financial Products 1997). An excellent discussion of the model 
structure from a more academic viewpoint is provided in Gordy (2000). Both sources 
also provide further information concerning the calibration of the factor variances 0; 
and factor weights w;;. The derivation of recursion formulas for the probabili- 
ties P(M =k), k = 0,1,..., via Panjer recursion is given in Appendix A10 of 
the CreditRiskt documentation. In Gordy (2002) an alternative approach to the 
computation of the loss distribution in CreditRiskT is proposed using the saddle- 
point approximation (see, for example, Jensen 1995). Further numerical work for 
CreditRisk™ can be found in papers by Kurth and Tasche (2003), Glasserman (2004) 
and Haaf, Reiss and Schoenmakers (2004). Importance-sampling techniques for 
CreditRisk™ are discussed in Glasserman and Li (2005). 


11.3 Asymptotics for Large Portfolios 


We now provide some asymptotic results for large portfolios in Bernoulli mixture 
models. These results can be used to approximate the credit loss distribution and 
associated risk measures in a large portfolio. Moreover, they are useful for iden- 
tifying the key sources of model risk in a Bernoulli mixture model. In particular, 
we will see that in one-factor models the tail of the loss distribution is essentially 
determined by the tail of the mixing distribution, which has direct consequences for 
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the analysis of model risk in mixture models and for the setting of capital adequacy 
rules for loan books. 


11.3.1 Exchangeable Models 


We begin our discussion of the asymptotic properties of Bernoulli mixture models 
with the special case of an exchangeable model. We consider an infinite sequence 
of obligors indexed by i € N with identical exposures e; = e and LGD equal to 
100%. We assume that, given a mixing variable Q € [0, 1], the default indicators 
Y; are independent Bernoulli random variables with conditional default probability 
P(Y; = 1 | Q = q) = q. This simple model can be viewed an as idealization of a 
large pool of homogeneous obligors. 

We are interested in the asymptotic behaviour of the relative loss (the loss 
expressed as a proportion of total exposure). Writing L™® = X; eY; for the 
total loss of the first m companies, the corresponding relative loss is given by 


Conditioning on Q = q, the Y; are independent with mean q and the strong law 
of large numbers implies that, given Q = q, limm—oo L™ /(me) = q almost 
surely. This shows that, for large m, the behaviour of the relative loss is essentially 
governed by the mixing distribution G(q) of Q. In particular, it can be shown that, 
for G strictly increasing, 


L™ 
lim VaRy (—) = qa (Q) (11.31) 
m— Oo me 


(see Proposition 11.18 below). 

These results can be used to analyse model risk in exchangeable Bernoulli mixture 
models. We consider the risk related to the choice of mixing distribution under the 
constraint that the default probability x and the default correlation py (or, equiva- 
lently, 2 and 72) are known and fixed. 

According to (11.31), for large m the tail of L“” is essentially determined by the 
tail of the mixing variable Q. In Figure 11.2 we plot the tail function of the probit- 
normal distribution (corresponding to the Gaussian threshold model), the logit- 
normal distribution, the beta distribution (close to CreditRisk* ; see Remark 11.15) 
and the mixture distribution corresponding to the Clayton copula (see Exam- 
ple 11.13). The plots are shown on a logarithmic scale and in all cases the first 
two moments have the values m = 0.049 and m2 = 0.003 13, which correspond 
roughly to Standard & Poor’s rating category B; the parameter values for each of 
the models can be found in Table 11.3. 

Inspection of Figure 11.2 shows that the tail functions differ significantly only 
after the 99% quantile, the logit-normal distribution being the one with the heaviest 
tail. From a practical point of view this means that the particular parametric form of 
the mixing distribution in a Bernoulli mixture model is of lesser importance once z 
and py have been fixed. Of course this does not mean that Bernoulli mixtures are 
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Figure 11.2. The tail of the mixing distribution of Q in four different exchangeable 
Bernoulli-mixture models: beta, probit-normal, logit-normal and Clayton. In all cases the 
first two moments have the values m = 0.049 and 22 = 0.003 13, which correspond roughly 
to Standard & Poor’s rating category B; the actual parameter values can be found in Table 11.3. 
The horizontal line at 107? shows that the models only really start to differ around the 99th 
percentile of the mixing distribution. 


Table 11.3. Parameter values for various exchangeable Bernoulli mixture models with iden- 
tical values of x and zp (and py). The values of x and rz correspond roughly to Standard & 
Poor’s ratings CCC, B and BB (in fact, they have been estimated from 20 years of Standard 
& Poor’s default data using the simple moment estimator in (11.50)). This table is used in 
the model-risk study of Section 11.1.5. and the simulation study of Section 11.5.2. 


Model Parameter CCC B BB 
All models T 0.188 0.049 0.0112 
T2 0.042 0.003 13 0.000 197 
py 0.044 6 0.0157 0.006 43 
Beta a 4.02 3.08 1.73 
b 17.4 59.8 153 
Probit-normal u —0.93 —1.71 —2.37 
o 0.316 0.264 0.272 
Logit-normal H —1.56 —3.1 —4.71 
o 0.553 0.556 0.691 
Clayton T 0.188 0.049 0.0112 
0 0.070 4 0.032 0.0247 


immune to model risk; the tail of L“” is quite sensitive to z and, in particular, to py, 
and these parameters are not easily estimated (see Section 11.5.4 for a discussion 
of statistical inference for mixture models). 
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11.3.2 General Results 


Since we are interested in asymptotic properties of the overall loss distribution, we 
also consider exposures and losses given default. Let (e;); <j be an infinite sequence 
of positive deterministic exposures, let (Y;);eņ be the corresponding sequence of 
default indicators, and let (6;);< be a sequence of rvs with values in (0, 1] represent- 
ing percentage losses given that default occurs. In this setting the loss for a portfolio 
of size m is given by LM = Yia 1 Li, where L; = e;ô; Y; are the individual losses. 
We are interested in results for the relative loss L™ / yr, ei, which expresses the 
loss as a proportion of total exposure. We introduce the notation am = )°"_, e; for 
the aggregate exposure to the first m obligors. 
We now make some technical assumptions for our model. 


(Al) There is a p-dimensional random vector W such that, conditional on W, the 
(Li)ien form a sequence of independent rvs. 


In this assumption the conditional independence structure is extended from the 
default indicators to the losses. Note that (A1) allows for the situation where ê; 
depends on the systematic factors W. This extension is relevant from an empirical 
viewpoint since evidence suggests that losses given default tend to depend on the 
state of the underlying economy (see Section 11.2.3). 


(A2) There is a function £: R? — [0, 1] such that 


lim 1 eam |Y = y) = Llp) (11.32) 


m> Am 
for all y € RP. We call £(y) the asymptotic relative loss function. 


Assumption (A2) implies that we preserve the essential composition of the portfolio 
as we allow it to grow (see, for instance, Example 11.20). 


(A3) The sequence of exposures satisfies limm— oo 4m = © and ye (ei Jai? < 
oO. 


This is a very weak technical assumption that would be satisfied by any realistic 
sequence of exposures. For example, if e; = e for all i, then ye ie Jai}? = 
Di i7? < oo. Even in the case where e; = i, so that exposures grow linearly 
with portfolio size, we have X£; (e;/aj)* = XP 2/G + 1))? < œ, since a; = 
sa ,k = i(i + 1)/2. To find a counterexample we would need to take a sequence 
of exposures where the cumulative sum grows at the same rate as the maximum of 
the first m exposures. In intuitive terms this means that the portfolio is dominated 
by a few large exposures (name concentration). 

The following result shows that under these assumptions the average portfolio 
loss is essentially determined by the asymptotic relative loss function £ and by the 
realization of the factor random vector W. 


Proposition 11.17. Consider a sequence L™ = yw", Li satisfying Assump- 
tions (A1)-(A3) above. Denote by P(- | W = w) the conditional distribution of 
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the sequence (Lj)jcn given W = y. Then 


lim I po = (y), P(-|W=Was. 
m> Am 
Proof. The proof is based on the following version of the law of large numbers for 
non-identically distributed random variables given by Petrov (1995, Theorem 6.7). 
If (Zi)ien is a sequence of independent random variables and (ai)iey is a sequence 
of positive constants satisfying liMm—oo dm = oo and EL (var(Z;)/a?) < ©, 
then, as m —> ov, 
1 m m 

— Zi - 2( z:)) > Oas. 

Te 
We set Zi = L; = eiôĝôiY; and a; = ey ex as before. We apply this result 
conditional on W = yw; that is, we work under the measure P(- | W = w). Under 
this measure the L; are independent by Assumption (A1). Note that var(L;) = 
e? var(ô; Y;) < e?, since ô; Y; is an rv on [0, 1]. Using Assumption (A3) we verify 


that 5 X ? 
var(Z;) ei 
DED (4) <x. 
i=l i 


a 
i=l `”? 


Applying Petrov’s result and Assumption (A2) we get 


1 _ 1 m 1 m 
lim —L™ — 2p) = lim (+ X Li- He(Du Y= »)) =0. 
m—>>œ dm m—>o dm i=l am i=l 


For one-factor Bernoulli mixture models a stronger result can be obtained that 
links the quantiles of the relative portfolio loss L™ /am to quantiles of the mixing 
distribution. 


Proposition 11.18. Consider a sequence L™ = yw, Li satisfying Assump- 
tions (A1l)-(A3) with a one-dimensional mixing variable Y with df G. Assume 
that the conditional asymptotic loss function (y) is strictly increasing and contin- 
uous and that G is strictly increasing at qa (W), i.e. that G(qq(W) +6) > @ for every 
ô > 0. Then 
lim VaRy (H) = l(qy(W)). (11.33) 
m—> oo am 
The assumption that £ is strictly increasing makes sense if it is assumed that low 
values of W correspond to good states of the world with lower conditional default 
probabilities and lower losses given default than average, while high values of W 
correspond to bad states with correspondingly higher losses given default. 


Proof. The proof is based on the following simple intuition. Since L™ /am con- 
verges to £(W) and since £ is strictly increasing by assumption, we have for large 


L™ E £ 
(=) X dall(W)) = Llqa (Y )). 


m 
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To turn this into a formal argument we use the following continuity result for quan- 
tiles, a proof of which may be found in Fristedt and Gray (1997, Proposition 5, 
p. 250) or Resnick (2008, Proposition 0.1). 


Lemma 11.19. Consider a sequence of random variables (Zm) men that converges 
in distribution to a random variable Z. Then limm—oo qu(Zm) = qu(Z) at all points 
of continuity u of the quantile function u +> qy(Z). 


In our case, L”) /am converges to (Y) almost surely, and hence in distribution. 
The assumption that the function £ is strictly increasing and that G is strictly increas- 
ing at qa (W) ensures that the distribution function of €(W) is strictly increasing at 
that point so that the quantile function of the rv (W) is continuous at a. This shows 
that 


1 : 
lim tu( 1”) = qa(l(W)). 
m—> oo am 


Finally, the equality qa (E(W)) = £ (qa (¥)) follows from Proposition A.3. 


Example 11.20. Consider the one-factor Bernoulli mixture model for k exchange- 
able groups defined by (11.18). Denote by r (i) the group of obligor i and assume 
that, within each group r, the exposures, LGDs and conditional default probabilities 
are identical and are given by e,, ô- and p, (Y), respectively. 

Suppose that we allow the portfolio to grow and that we write mi”) for the number 
of obligors in group r when the portfolio size is m. The relative exposure to group 
r is given by a89 = e,m™ / Fei em™, and we assume that am) —> A, as 
m — oo. In this case the asymptotic relative loss function in equation (11.32) is 


m 
Pi crests er(i) oe 
ey) = lim 2 aan bra Pray) 


k 
= Dor drh (Mr + OW). 


r=l 


Since W is assumed to have a standard normal distribution, (11.33) implies that 


L™ £ 
lim qa si) = So Arô-h (pr +o@'(a)). (11.34) 
m—> Oo pe ej a 


For large m, since i, yet eir me, we get that 


k 
VaRg(L™) ~ ye m™e,8,h(ur +0,87! (a)). (11.35) 


r=1 


11.3. Asymptotics for Large Portfolios 455 


11.3.3 The Basel IRB Formula 


In this section we examine how the considerations of Sections 11.3.1 and 11.3.2 
have influenced the Basel capital adequacy framework, which was discussed in 
more general terms in Section 1.3. Under this framework a bank is required to hold 
8% of the so-called risk-weighted assets (RWA) of its credit portfolio as risk capital. 
The RWA of a portfolio is given by the sum of the RWA of the individual risks in the 
portfolio, i.e. RWAPOfolio — 5>" | RWA,. The quantity RWA; reflects the exposure 
size and the riskiness of obligor i and takes the form RWA; = wje;, where w; is a 
risk weight and e; denotes exposure size. 

Banks may choose between two options for determining the risk weight w;, which 
must then be implemented for the entire portfolio. Under the simpler standardized 
approach, the risk weight w; is determined by the type (sovereign, bank or corporate) 
and the credit rating of counterparty 7. For instance, w; = 50% for a corporation 
with a Moody’s rating in the range A+ to A—. Under the more advanced internal- 
ratings-based (IRB) approach, the risk weight takes the form 


p- (pi) + he 02) 
V1 — Bi 


Here, c is a technical adjustment factor that is of minor interest to us, pj represents 
the default probability, and 6; is the percentage loss given default of obligor i. The 
parameter 6; € (0.12, 0.24) measures the systematic risk of obligor i. Estimates 
for p; and (under the so-called advanced IRB approach) for 6; and e; are provided 
by the individual bank; the adjustment factor c and, most importantly, the value of 
ßi are determined by fixed rules within the Basel II Accord independently of the 
structure of the specific portfolio under consideration. The risk capital to be held for 
counterparty i is thus given by 


wi = (0.08) "05,0 ( (11.36) 


(11.37) 


B7! (0. 
RC; = 0.08RWA; = ceio ( (pi) + VBiP 0 A 


V1— Bi 


The interesting part of equation (11.37) is, of course, the expression involving the 
standard normal df, and we now give a derivation. 

Consider a one-factor Gaussian threshold with default probabilities p1, ..., Dm 
and critical variables given by 


Xi = JBiF + J/1— bisi (11.38) 


for iid standard normal rvs F, £1,..., €m. By taking W = —F, the equivalent 
Bernoulli mixture model was shown in Section 11.2.4 to have conditional default 
probabilities p; (Y) = &((@~!(p;)+/Biv)//1 — Bi). Note that this is of the form 
pil) = A(ujtoip) forh = $, ui = D7! (pi)/ VT — Bj ando; = /B;/0 — Bi). 
Assume, moreover, that the portfolio has a homogeneous group structure consisting 
of a few large groups with (approximately) identical exposures, PDs, LGDs and 
factor weights within the groups, as in Example 11.20. 
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Applying the analysis of that example and, in particular, equation (11.35), it 
follows that 


VaRa(L) © J diei pi (qa (P) 
i=l 
-S Ge + {fia 
= je; . 
vl -= Bi 


For c = 1 the risk capital RC; in (11.37) can thus be considered as the asymptotic 
contribution of risk i to the 99.9% VaR of the overall portfolio in a one-factor 
Gaussian threshold model with homogeneous group structure. Note, further, that 6; 
can be viewed as the asset correlation for firms i within the same group. 

While formula (11.36) is influenced by portfolio-theoretic considerations, the 
Basel framework falls short of reflecting the true dependence structure of a bank’s 
credit portfolio for a number of reasons. First, in the Basel framework the parameters 
ßi are specified ad hoc by regulatory rules irrespective of the composition of the port- 
folio at hand. Second, the homogeneous group structure and the simple one-factor 
model (11.38) are typically oversimplified representations of the factor structure 
underlying default dependence, particularly for internationally active banks. Third, 
the rule is based on an asymptotic result. Moreover, historical default experience 
for the portfolio under consideration has no formal role to play in setting capital 
adequacy standards. These deficiencies should be weighed against the relative sim- 
plicity of the IRB approach, which makes it suitable for use in a supervisory setting. 
For economic capital purposes, on the other hand, most banks develop fully internal 
models with more sophisticated factor models to describe dependencies. 


Notes and Comments 


The results in Section 11.3 are an amalgamation of results from Frey and McNeil 
(2003) and Gordy (2003). The first limit result for large portfolios was obtained in 
Vasicek (1997) for a probit-normal mixture model equivalent to the KMV model. 
Asymptotic results for credit portfolios related to the theory of large deviations are 
discussed in Dembo, Deuschel and Duffie (2004). For details of the IRB approach, 
and the Basel II Capital Accord in general, we refer to the website of the Basel 
Committee: www.bis.org/bcbs. Our discussion in Section 11.3.3 is related to the 
analysis by Gordy (2003). 

There have been a number of papers on second-order corrections or so-called gran- 
ularity adjustments to the large-portfolio results in Propositions 11.17 and 11.18. 
While the results assume that idiosyncratic risk diversifies away in sufficiently large 
portfolios, these corrections take into account the fact that a certain amount of 
idiosyncratic risk and name concentration will remain in real portfolios. References 
include Martin and Wilde (2002), Gordy (2004), Gordy and Marrone (2012), Gordy 
and Lutkebohmert (2013) and Gagliardini and Gouriéroux (2013). See also the book 
by Lutkebohmert (2009) on concentration risk in credit portfolios. 
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11.4 Monte Carlo Methods 


In this section we consider a Bernoulli mixture model for a loan portfolio and assume 
that the overall loss is of the form L = }7”"_, L;, where the L; are conditionally inde- 
pendent given some economic factor vector W. A possible method for calculating 
risk measures and related quantities such as capital allocations is to use Monte Carlo 
(MC) simulation, although the problem of rare-event simulation arises. Suppose, 
for example, that we wish to compute expected shortfall and expected shortfall 
contributions at the confidence level a for our portfolio. We need to evaluate the 
conditional expectations 


E(L |L > qa(L)) and E(Li | L > qa(L)). (11.39) 


If æ = 0.99, say, then only 1% of our standard Monte Carlo draws will lead to a 
portfolio loss higher than go.99(L). The standard MC estimator of (11.39), which 
consists of averaging the simulated values of L or L; over all draws, leading to a 
simulated portfolio loss L > qa (L), will be unstable and subject to high variability, 
unless the number of simulations is very large. The problem is of course that most 
simulations are “wasted”, in that they lead to a value of L that is smaller than gy (L). 
Fortunately, there exists a variance-reduction technique known as importance sam- 
pling (IS), which is well suited to such problems. 


11.4.1 Basics of Importance Sampling 


Consider an rv X on some probability space (2, F, P) and assume that it has an 
absolutely continuous df with density f. A generalization to general probability 
spaces is discussed below. The problem we consider is the computation of the 
expected value 


[0,0] 
0 = E(h(X)) = f h(x) f (x) dx (11.40) 
—cC 
for some known function h. To calculate the probability of an event we consider 
a function of the form h(x) = Ixea} for some set A C R; for expected shortfall 
computation we consider functions of the form h(x) = xJj,>-} for some c € R. 
Where the analytical evaluation of (11.40) is difficult, due to the complexity of the 
distribution of X, we can resort to an MC approach, for which we only have to be 


able to simulate variates from the distribution with density f. 
Algorithm 11.21 (Monte Carlo integration). 


(1) Generate Xj, ..., Xn independently from density f. 
(2) Compute the standard MC estimate @M© = (1/n) )7"_, A(X;). 


The MC estimator converges to 0 by the strong law of large numbers, but the 
speed of convergence may not be particularly fast, particularly when we are dealing 
with rare-event simulation. 

Importance sampling is based on an alternative representation of the integral 
in (11.40). Consider a second probability density g (whose support should contain 
that of f) and define the likelihood ratio r(x) by r(x) := f(x)/g(x) whenever 
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g(x) > 0, and r(x) = 0 otherwise. The integral (11.40) may be written in terms of 
the likelihood ratio as 


OO 
0 =f h(x)r(x)g(x) dx = Eg(h(X)r(X)), (11.41) 
=00 
where E denotes expectation with respect to the density g. We can therefore approx- 
imate the integral with the following algorithm. 


Algorithm 11.22 (importance sampling). 
(1) Generate X;,..., X, independently from density g. 
(2) Compute the IS estimate 68 = (1/n) Y% AXD r(X;). 


The density g is often termed the importance-sampling density. The art (or sci- 
ence) of importance sampling is in choosing an importance-sampling density such 
that, for fixed n, the variance of the IS estimator is considerably smaller than that of 
the standard Monte Carlo estimator. In this way we can hope to obtain a prescribed 
accuracy in evaluating the integral of interest using far fewer random draws than are 
required in standard Monte Carlo simulation. The variances of the estimators are 
given by 


varg (ôS) = (1/n)(Eg(h(X)*r(X)) — 6), 
var(@MC) = (1/n)(E(h(X)*) — 67), 


so the aim is to make E, (h(X)2r(X)2) small compared with E (h (X)*). In theory, 
the variance of 6!S can be reduced to zero by choosing an optimal g. To see this, 
suppose for the moment that h is non-negative and set 


g(x) = FAAEA). (11.42) 


With this choice, the likelihood ratio becomes r(x) = E(h(X))/h(x). Hence 
gis = h(X1)r(X1) = E(h(X)), and the IS estimator gives the correct answer in 
a single draw. In practice, it is of course impossible to choose an IS density of the 
form (11.42), as this requires knowledge of the quantity E(h(X)) that one wants 
to compute; nonetheless, (11.42) can provide useful guidance in choosing an IS 
density, as we will see in the next section. 

Consider the case of estimating a rare-event probability corresponding to 
h(x) = Iy>c} for c significantly larger than the mean of X. Then we have that 
E(h(X)*) = P(X > c) and, using (11.41), that 


E,(h(X)*r(X)*) = E; (r(X¥; X > ©) = EG(X); X > ©). (11.43) 


Clearly, we should try to choose g such that the likelihood ratio r(x) = f(x)/g(x) 
is small for x > c; in other words, we should make the event {X > c} more likely 
under the IS density g than it is under the original density f. 
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Exponential tilting. We now describe a useful way of finding IS densities when 
X is light tailed. For t € R we write My(t) = E(e’*) = f°. e™ f(x) dx for the 
moment-generating function of X, which we assume is finite for t € R. If My (t) is 
finite, we can define an IS density by g;(x) := e’* f (x)/Mx (t). The likelihood ratio 
is r(x) = f(x)/g;(x) = My (t)e™. Define u, to be the mean of X with respect 


to the density g;, i.e. 
My = Eg (X) = E(Xe!*)/My(t). (11.44) 


How can we choose t optimally for a particular IS problem? We consider the case 
of tail probability estimation and recall from (11.43) that the objective is to make 


E(r(X); X 2 c)= E(lx>}Mx(t)e™*) (11.45) 
small. Now observe that e™** < e~“ for x 2 c and t È 0, so 
E(Ix>qMx(e"*) < Mx (te. 


Instead of solving the (difficult) problem of minimizing (11.45) over t, we choose 
t such that this bound becomes minimal. Equivalently, we try to find t minimizing 
In My (t) — tc. Using (11.44) we obtain that 

_ E(Xel!*) 


d 
—InMy(t tc= = —c, 
ao x(t) — te Mx) c= Wuce 


which suggests choosing t = t (c) as the solution of the equation u; = c, so that the 
rare event {X > c} becomes a normal event if we compute probabilities using the 
density gr(c). A unique solution of the equation us = c exists for all relevant values 
of c. In the cases that are of interest to us this is immediately obvious from the form 
of the exponentially tilted distributions, so we omit a formal proof. 


Example 11.23 (exponential tilting for normal distribution). We illustrate the 
concept of exponential tilting in the simple case of a standard normal rv. Suppose 
that X ~ N(O, 1) with density (x). Using exponential tilting we obtain the new 
density g;(x) = e’*o(x)/My(t). The moment-generating function of X is known 
to be My (t) = e’/2. Hence 


exp(—3(x — D’), 


io exp(tx — (0 +.x7)) = 
IT 


1 
~v 2T 
so that, under the tilted distribution, X ~ N (t, 1). Note that in this case exponential 
tilting corresponds to changing the mean of X. 


An abstract view of importance sampling. To handle the more complex application 
to portfolio credit risk in the next section it helps to consider importance sampling 
from a slightly more general viewpoint. Given densities f and g as above, define 
probability measures P and Q by 


PA= f fads and OA) = f g)ds, ACR. 
A A 
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With this notation, (11.41) becomes 6 = E?(h(X)) = E2(h(X)r(X)), so that 
r(X) equals dP /dQ, the (measure-theoretic) density of P with respect to Q. Using 
this more abstract view, exponential tilting can be applied in more general situations: 
given an rv X on (2, F, P) such that My (t) = E? (e!*) < 00, define the measure 
Q: on (2, F) by 
dor e" ie. Q,(A) = zr ( e” ; A), 
dP Mx‘(t) Mx(t) 
and note that (dQ;/dP)~! = Mx(t)e"'* = r;(X). The IS algorithm remains 
essentially unchanged: simulate independent realizations X; under the measure Q; 
and set ÎS = (1/n) yo Xirr (Xi) as before. 


11.4.2 Application to Bernoulli Mixture Models 


In this section we return to the subject of credit losses and consider a portfolio loss of 
the form L = » 1 €i Yi, where the e; are deterministic, positive exposures and the 
Y; are default indicators with default probabilities p;. We assume that Y follows a 
Bernoulli mixture model in the sense of Definition 11.5 with factor vector W and con- 
ditional default probabilities p;(W). We study the problem of estimating exceedance 
probabilities 6 = P(L > c) for c substantially larger than E(L) using importance 
sampling. This is useful for risk-management purposes, as, for c © qq(L), a good 
IS distribution for the computation of P(L > c) also yields a substantial variance 
reduction for computing expected shortfall or expected shortfall contributions. 

We consider first the situation where the default indicators Y1, ..., Ym are inde- 
pendent, and then we discuss the extension to the case of conditionally independent 
default indicators. Our exposition is based on Glasserman and Li (2005). 


Independent default indicators. Here we use the more general IS approach out- 
lined at the end of the previous section. Set 2 = {0, 1}”, the state space of Y. The 
probability measure P is given by 


Py) =| [pa - pD", ye {0,)”. 


i=l 
We need to understand how this measure changes under exponential tilting using L. 
The moment-generating function of L is easily calculated to be 


M(t) = E(exp (: Dear) = [Ee = | [epi +1-— pi). 
i=l i=l i=l 


The measure Q; is given by Q;({y}) = EP (el! /ML(t); Y = y) and hence 


nm efi di 


exp(t $ oiz eii) 
= P = 
QY» O (82) I] 


Yi 1 ; l—yi 
eip; +1— pi ( Pi) 
Define new default probabilities by q; ; := e“! p;/(e'® pi + 1 — pi). It follows that 
QY» = [I gA — 4,1)! ~, so that after exponential tilting the default indi- 
cators remain independent but with new default probability q,,;. Note that g;,; tends 
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to 1 for t —> œ and to 0 for t —> —ov, so that we can shift the mean of L to any 
point in (0, $74} ei). 

In analogy with our previous discussion, for IS purposes, the optimal value of t 
is chosen such that E2'(L) = c, leading to the equation ere 


Conditionally independent default indicators. The first step in the extension of 
the importance-sampling approach to conditionally independent defaults is obvious: 
given arealization w of the economic factors, the conditional exceedance probability 
O(v) := P(L 2 c | W = ) is estimated using the approach for independent 
default indicators described above. We have the following algorithm. 


Algorithm 11.24 (IS for conditional loss distribution). 


(1) Given y, calculate the conditional default probabilities p;(w) according to 
the particular model, and solve the equation 


ee e 
‘elipi(wW)+1—pib) ” 


i=1 
the solution t = t (c, Y) gives the optimal degree of tilting. 
(2) Generate nı conditional realizations of the default vector Y = (Yj,..., Yin)’. 


The defaults of the companies are simulated independently, with the default 
probability of the ith company given by 


exp(t(c, Wei) pi) 
exp(t(c, Wei) pi(W) + 1— pi) 


(3) Denote by Mz (t,W) := [TfL fe’ pi(h) + 1 — pi()} the conditional 
moment-generating function of L. From the simulated default data construct 
nı conditional realizations of L = Siar eiY; and label these L“,..., L@, 
Determine the IS estimator for the conditional loss distribution: 


‘ ee 
Oh) = MLC, W), Ho X hroza xP(—t(c, WIL). 
j=l 


In principle, the approach discussed above also applies in the more general situ- 
ation where the loss given default is random; all we need to assume is that the L; 
are conditionally independent given W, as in Assumption (A1) of Section 11.3. 
However, the actual implementation can become quite involved. 


IS for the distribution of the factor variables. Suppose we now want to estimate the 
unconditional probability © = P(L > c). A naive approach would be to generate 
realizations of the factor vector W and to estimate 0 by averaging the IS estimator of 
Algorithm 11.24 over these realizations. As is shown in Glasserman and Li (2005), 
this is not the best solution for large portfolios of dependent credit risks. Intuitively, 
this is due to the fact that for such portfolios most of the variation in L is caused by 
fluctuations of the economic factors, and we have not yet applied IS to the distribution 
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of W. For this reason we now discuss a full IS algorithm that combines IS for the 
economic factor variables with Algorithm 11.24. 

We consider the important case of a Bernoulli mixture model with multivariate 
Gaussian factors and conditional default probabilities p;(W) for W ~ N,(0, 2), 
such as the probit-normal Bernoulli mixture model described in Example 11.11. 
In this context it is natural to choose an importance-sampling density such that 
W ~ Np (a, 2) for a new mean vector y € R?, i.e. we take g as the density of 
Np(, £2). For a good choice of u we expect to generate realizations of W leading to 
high conditional default probabilities more frequently. The corresponding likelihood 
ratio r,(W) is given by the ratio of the respective multivariate normal densities, so 
that 


exp(—5W/Q7!w) 
exp(—5(W — p) 2-7! (W — )) 
Essentially, this is a multivariate analogue of the exponential tilting applied to a 
univariate normal distribution in Example 11.23. 


Now we can describe the algorithm for full IS. At the outset we have to choose 
the overall number of simulation rounds, n, the number of repetitions of conditional 


r,(¥) = = exp(—w/Q7'W + iu 2p). 


IS per simulation round, n1, and the mean of the IS distribution for the factors, m. 
Whereas the value of n depends on the desired degree of precision and is best 
determined in a simulation study, nı should be taken to be fairly small. An approach 
to determine a sensible value of p is discussed below. 


Algorithm 11.25 (full IS for mixture models with Gaussian factors). 
(1) Generate %,...,W, ~ N (p, Ip). 
(2) For each W; calculate ÂI! (W;) as in Algorithm 11.24. 


(3) Determine the full IS estimator: 


R i A 
Be = = D ra Â V). 


i=l 


Choosing u. A key point in the full IS approach is the determination of a value for 
H that gives a low variance for the IS estimator. Here we sketch the solution proposed 
by Glasserman and Li (2005). Since ô$! (4) ~ P(L > c | W = y), applying IS 
to the factors essentially amounts to finding a good IS density for the function 
vw —> P(L=c|W = y). Now recall from our discussion in the previous section 
that the optimal IS density g* satisfies 


s'(h) x P(L > c| Y = Wexp(-5w’'2"p), (11.46) 


where “œ” stands for “proportional to”. Sampling from that density is obviously not 
feasible, as the normalizing constant involves the exceedance probability P(L > c) 
that we are interested in. In this situation the authors suggest using a multivariate 
normal density with the same mode as g* as an approximation to the optimal IS 
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density. Since a normal density attains its mode at the mean m, this amounts to 
choosing pm as the solution to the optimization problem 


ma P(L>c|W=wW) exp(—4y’/Q7'p). (11.47) 


An exact (numerical) solution of (11.47) is difficult because the function P(L > c | 
wW = w) is usually not available in closed form. Glasserman and Li (2005) discuss 
several approaches to overcoming this difficulty (see their paper for details). 


Example 11.26. We give a very simple example of IS to show the gains that can be 
obtained by applying IS at both the level of the factor variables and the level of the 
conditional loss distribution. 

Consider an exchangeable one-factor Bernoulli mixture model in which the factor 
W is standard normally distributed and the conditional probability of default is given 
by 


-l (p) + Jo ) 
vi=p 

for all obligors i. Let the unconditional default probability be p = 0.05, let the 
asset correlation o = 0.05 and consider m = 100 obligors, each with an identical 
exposure e; = 1. We are interested in the probability 6 = P(L > 20), where 
L = } ;-; Y;. In this set-up we can calculate, using numerical integration, that 
0 ~ 0.00112, so {L > 20} is a relatively rare event. In the first panel of Figure 11.3 
we apply naive Monte Carlo estimation of 0 and plot gMc 
is shown by a horizontal line. 

In the second panel we apply importance sampling at the level of the factor W 
using the value u = —2.8 for the mean of the distribution of W under Q and plot the 
resulting estimate for different values of n, the number of random draws of the factor. 
In the third panel we apply IS at the level of the conditional loss distribution, using 
Algorithm 11.24 with nı = 50, but we apply naive Monte Carlo to the distribution 
of the factor. 

In the final panel we apply full IS using Algorithm 11.25, and we plot ĝIS against 
n using nı = 50 as before. This is clearly the only estimate that appears to have 
converged to the true value by the time we have sampled n = 10000 values of the 
factor. 


pi) = PU =1\¥ =n =o 


against n; the true value 


Notes and Comments 


Our discussion of IS for credit portfolios follows Glasserman and Li (2005) closely. 
Theoretical results on the asymptotics of the IS estimator for large portfolios and 
numerical case studies contained in Glasserman and Li (2005) indicate that full IS is a 
very useful tool for dealing with large Bernoulli mixture models. Merino and Nyfeler 
(2004) and Kalkbrener, Lotter and Overbeck (2004) undertook related work—the 
latter paper gives an interesting alternative solution to finding a reasonable IS mean pe 
for the factors. 

For a general introduction to importance sampling we refer to the excellent text- 
book by Glasserman (2003) (see also Asmussen and Glynn 2007; Robert and Casella 
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Figure 11.3. Illustration of the significant improvements that can be made in the estimation 
of rare-event probabilities for Bernoulli mixture models when importance sampling is applied 
at the level of both the factors and the conditional loss distribution given the factors (see 
Example 11.26 for details). 


1999). For applications of importance sampling to heavy-tailed distributions, where 
exponential families cannot be applied directly, see Asmussen, Binswanger and 
Højgaard (2000) and Glasserman, Heidelberger and Shahabuddin (1999). 

An alternative to simulation is the use of analytic approximations to the portfolio 
loss distribution. Applications of the saddle-point approximation (see Jensen 1995) 
are discussed in Martin, Thompson and Browne (2001) and Gordy (2002). 


11.5 Statistical Inference in Portfolio Credit Models 


In the remainder of this chapter we consider two different approaches to the estima- 
tion of portfolio credit risk models. In Section 11.5.1 we discuss the calibration of 
industry threshold models, such as the CreditMetrics model and the portfolio version 
of the Moody’s public-firm EDF model; we focus on the estimation of the factor 
model describing the dependence structure of the critical variables using proxy data 
on equity or asset returns. 

In Sections 11.5.2—11.5.4 we discuss the direct estimation of Bernoulli or Poisson 
mixture models from historical default data. This approach has been less widely 
applied in industry due to the relative scarcity of data on defaults, particularly for 
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higher-rated firms. However, it has become increasingly feasible with the availability 
of larger databases of historical defaults and rating migrations. 


11.5.1 Factor Modelling in Industry Threshold Models 


Recall from Section 11.1.3 that many industry models take the form of a Gaussian 
threshold model (X, d) with X ~ N,,(0, P), where the random vector X contains 
the critical variables representing the credit quality or “ability-to-pay” of the obligors 
in the portfolio, the deterministic vector d contains the critical default thresholds, 
and P is the so-called asset correlation matrix, which is estimated with the help of 
a factor model for X. 

Industry models generally separate the calibration of the vector d (or the threshold 
matrix D in a multi-state model) and the calibration of the factor model for X. As 
discussed in Section 11.1.3, in a default-only model the threshold d; is usually set 
atdi = &®~'(p;), where p; is an estimate of the default probability for obligor i for 
the time period in question (generally one year). Depending on the type of obligor, 
the default probability may be estimated in different ways: for larger corporates it 
may be estimated using credit ratings or using a firm-value approach, such as the 
Moody’s public-firm EDF model; for retail obligors it may be estimated on the basis 
of credit scores. 

We concentrate in this section on the estimation of the factor model for X. We 
recall that this takes the form 


Xi = JB +J/1— bici, i=1,...,m, (11.48) 


where F; and £1, ..., €m are independent standard normal variables, and where 
0 < ŝi < 1 for all i. The systematic variables F; are assumed to be of the form 
F, = a; F , where F is a vector of common factors satisfying F ~ Np (0, 2) with 
p < m, and where £2 is a correlation matrix. The factors typically represent country 
and industry effects and the assumption that var(F;) = 1 imposes the constraint that 
a, Qa; = | for alli. 

Different industry models use different data for X to calibrate the factor 
model (11.48). The Moody’s Analytics Global Correlation, or GCorr, model has 
sub-models for many different kinds of obligor, including public corporate firms, 
private firms, small and medium-sized enterprises (SMEs), retail customers and 
sovereigns (Huang et al. 2012). The sub-model for public firms (GCorr Corporate) 
is calibrated using data on weekly asset value returns, where asset values are deter- 
mined as part of the public-firm EDF methodology described in Section 10.3.3. In 
the CreditMetrics framework, weekly equity returns are viewed as a proxy for asset 
returns and used to estimate the factor model (RiskMetrics Group 1997). In both 
cases the data contain information about changing credit quality. 

We now provide a sketch of a generic procedure for estimating a factor model 
for corporates where the factors have country- and industry-sector interpretations. 
Specific industry models follow this procedure in outline but may differ in the details 
of the calculations at certain steps. We assume that we have a high-dimensional 
multivariate time series (X;)1<;<n of asset returns (or other proxy data for changing 
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credit quality) over a period of time in which stationarity can be assumed; we 
also assume that each component time series has been scaled to have mean 0 and 
variance 1. 


(1) We first fix the structure of the factor vector F. For example, the first block 
of components might represent country factors and the second block of com- 
ponents might represent industry factors. We then assign vectors of factor 
weights a; to each obligor based on our knowledge of the companies. The 
elements of a; may simply consist of ones and zeros if the company can be 
clearly identified with a single country and industry, but may also consist of 
weights if the company has significant activity in more than one country or 
more than one industry sector. For example, a firm that does 60% of its busi- 
ness in one country and 40% in another would be coded with weights of 0.6 
and 0.4 in the relevant positions of a;. 


(2) We then use cross-sectional estimation techniques to estimate the factor values 
F, at each time point t. Effectively, the factor estimates F, are constructed as 
weighted sums of the X;,; data for obligors i that are exposed to each factor. 
One way of achieving this is to construct a matrix A with rows a; and then to 
estimate a fundamental factor model of the form X; = AF; + €; at each time 
point ¢, as described in Section 6.4.4. 


(3) The raw factor estimates form a multivariate time series of dimension p. We 
standardize each component series to have mean 0 and variance 1 to obtain 
(Fiict<n and calculate the sample covariance matrix of the standardized 
factor estimates, which serves as our estimate of 2. 


(4) We then scale the vectors of factor weights a; such that the conditions a; Qa; = 
1 are met for each obligor. 


(5) Time series of estimated systematic variables for each obligor are then con- 
structed by calculating F; ; = a; F, fort = 1,...,n. 


(6) Finally, we estimate the 6; parameters by performing a time-series regression 
of X; i on F; i for each obligor. 


Note that the accurate estimation of the £; in the last step is particularly important. 
In Section 11.1.5 we showed that there is considerable model risk associated with 
the size of the specific risk component, particularly when the tail of a credit loss 
distribution is of central importance. The estimate of £; is the so-called R-squared 
of the time-series regression model in step (6) and will be largest for the firms whose 
credit-quality changes are best explained by systematic factors. 


11.5.2 Estimation of Bernoulli Mixture Models 


We now turn our attention to the estimation of Bernoulli mixture models of portfolio 
credit risk from historical default data. The models we describe are motivated by the 
format of the data we consider, which can be described as repeated cross-sectional 
data. This kind of data, comprising observations of the default or non-default of 
groups of monitored companies in a number of time periods, can be readily extracted 
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from the rating-migration and default databases of rating agencies. Since the group 
of companies may differ from period to period, as new companies are rated and 
others default or cease to be rated, we have a cross-section of companies in each 
period, but the cross-section may change from period to period. 

In this section we discuss the estimation of default probabilities and default cor- 
relations for homogeneous groups, e.g. groups with the same credit rating. In Sec- 
tions 11.5.3 and 11.5.4 we consider more complicated one-factor models allowing 
more heterogeneity and make a link to the important class of generalized linear 
mixed models (GLMMs) used in many statistical applications. 

Suppose that we observe historical default numbers over n periods of time for 
a homogeneous group; typically these might be yearly data. For t = 1,..., 7, let 
m; denote the number of observed companies at the start of period t and let M, 
denote the number that defaulted during the period; the former will be treated as 
fixed at the outset of the period and the latter as an rv. Suppose further that within 
a time period these defaults are generated by an exchangeable Bernoulli mixture 
model of the kind described in Section 11.2.2. In other words, assume that, given 
some mixing variable Q, taking values in (0, 1) and the cohort size m,, the number 
of defaults M, is conditionally binomially distributed and satisfies M, | Q; = 
q ~ B(m,, q). Further assume that the mixing variables Q1, ..., Qn are identically 
distributed. We consider two methods for estimating the fundamental parameters 
of the mixing distribution m = 7, m2 and py (default correlation); these are the 
method of moments and the maximum likelihood method. 


A simple moment estimator. Forl < t <n, let Y;1,..., Yt m, be default indicators 
for the m, companies in the cohort. Suppose we define the rv 


M 
( ) = 5 Yeni Yri: (11.49) 
{iis sik} C{1,... m1} 


this represents the number of possible subgroups of k obligors among the defaulting 
obligors in period ¢ (and takes the value zero when k > M,). By taking expectations 


in (11.49) we get F (("*)) _ (i)a 
n= r(e) 


We estimate the unknown theoretical moment zg by taking a natural empirical 
average (11.50) constructed from the n years of data: 


and hence 


ʻ pe. (*) 1 M,(M, — 1)---(M; —k + 1) 
7 ~ l 11.50 
i n ds C) a ais = ED ( ) 


For k = 1 we get the standard estimator of default probability 


n 
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and py can obviously be estimated by taking py = (2 — 7%*)/(@ — 77). The esti- 
mator is unbiased for 7g and consistent as n — oo (for more details see Frey and 
McNeil (2001)). Note that, for Q, random, consistency requires observations for a 
large number of years; it is not sufficient to observe a large pool in a single year. 


Maximum likelihood estimators. To implement a maximum likelihood (ML) pro- 
cedure we assume a simple parametric form for the density of the Q; (such as beta, 
logit-normal or probit-normal). The joint probability function of the default counts 
M,,..., Mn given the cohort sizes mı, ..., mn can then be calculated using (11.14), 
under the assumption that the Q, variables in different years are independent. This 
expression is then maximized with respect to the natural parameters of the mixing 
distribution (i.e. a and b in the case of beta and u and o for the logit-normal and 
probit-normal). Of course, independence may be an unrealistic assumption for the 
mixing variables, due to the phenomenon of economic cycles, but the method could 
then be regarded as a quasi-maximum likelihood (QML) procedure, which misspec- 
ifies the serial dependence structure but correctly specifies the marginal distribution 
of defaults in each year and still gives reasonable parameter estimates. 

In practice, it is easiest to use the beta mixing distribution, since, in this case, 
given the group size m; in period t, the rv M, has a beta-binomial distribution with 
probability function given in (11.16). The likelihood to be maximized therefore 
takes the form 


L(a, b; data) = | | Bat Minin = M) 
f ’ 


t=1 
and maximization can be performed numerically with respect to a and b. For fur- 
ther information about the ML method consult Section A.3. The ML estimates of 
x = T1, T2 and py are calculated by evaluating moments of the fitted distribution 
using (11.15); the formulas are given in Example 11.7. 


A comparison of moment estimation and ML estimation. To compare these two 
approaches we conduct a simulation study summarized in Table 11.4. To generate 
data in the simulation study we consider the beta, probit-normal and logit-normal 
mixture models of Section 11.2.2. In any single experiment we generate 20 years of 
data using parameter values that roughly correspond to one of the Standard & Poor’s 
credit ratings CCC, B or BB (see Table 11.3 for the parameter values). The number of 
firms m; in each of the years is generated randomly using a binomial-beta model to 
give a spread of values typical of real data; the defaults are then generated using one 
of the Bernoulli mixture models, and estimates of 7, 22 and py are calculated. The 
experiment is repeated 5000 times and a relative root mean square error (RRMSE) 
is estimated for each parameter and each method: that is, we take the square root of 
the estimated MSE and divide by the true parameter value. Methods are compared 
by calculating the percentage increase of the estimated RRMSE with respect to the 
better method (i.e. the RRMSE-minimizing method) for each parameter. 

It may be concluded from Table 11.4 that the ML method is better in all but one 
experiment. Surprisingly, it is better even in the experiments when it is misspecified 
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Table 11.4. Each part of the table relates to a block of 5000 simulations using a particular 
exchangeable Bernoulli mixture model with parameter values roughly corresponding to a 
particular S&P rating class. For each parameter of interest, an estimated RRMSE is tabulated 
for both estimation methods: moment estimation using (11.50) and ML estimation based 
on the beta model. Methods can be compared by using A, the percentage increase of the 
estimated RRMSE with respect to the better method (i.e. the RRMSE-minimizing method) 
for each parameter. For each parameter the better method therefore has A = 0. The table 
clearly shows that MLE is at least as good as the moment estimator in all but one case. 


Moment MLE-beta 
—___ — 


Group True model Parameter RRMSE A RRMSE A 


CCC Beta T 0.101 0 0.101 0 
CCC Beta T2 0.202 0 0.201 0 
CCC Beta py 0.332 5 0.317 0 
CCC _ Probit-normal T 0.100 0 0.100 0 
CCC _ Probit-normal T2 0.205 1 0.204 0 
CCC Probit-normal py 0.347 11 0.314 0 
CCC Logit-normal T 0.101 0 0.101 0 
CCC Logit-normal T2 0.209 1 0.208 0 
CCC Logit-normal py 0.357 11 0.320 0 
B Beta T 0.130 0 0.130 0 
B Beta T2 0.270 0 0.269 0 
B Beta py 0.396 8 0.367 0 
B Probit-normal T 0.130 0 0.130 0 
B Probit-normal T2 0.286 3 0.277 0 
B Probit-normal py 0.434 19 0.364 0 
B Logit-normal T 0.131 0 0.132 0 
B Logit-normal T2 0.308 7 0.289 0 
B Logit-normal py 0.493 26 0.392 0 
BB Beta T 0.199 0 0.199 0 
BB Beta T2 0.435 0 0.438 1 
BB Beta py 0.508 7 0.476 0 
BB Probit-normal T 0.197 0 0.197 0 
BB Probit-normal T2 0.492 10 0.446 0 
BB Probit-normal py 0.607 27 0.480 0 
BB Logit-normal T 0.196 0 0.196 0 
BB Logit-normal T2 0.572 24 0.462 0 
BB Logit-normal py 0.752 45 0.517 0 


and the true mixing distribution is either probit-normal or logit-normal; in fact, 
in these cases, it offers more of an improvement than in the beta case. This can 
partly be explained by the fact that when we constrain well-behaved, unimodal 
mixing distributions with densities to have the same first and second moments, 
these distributions are very similar (see Figure 11.2). Finally, we observe that the 
ML method tends to outperform the moment method more as we increase the credit 
quality, so that defaults become rarer. 
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11.5.3 Mixture Models as GLMMs 


A one-factor Bernoulli mixture model. Recall the simple one-factor model (11.17), 
which generalizes the exchangeable model in Section 11.2.2, and consider the case 
where o; = o for all obligors i. Rewriting slightly, this model has the form 


pi) = h(n + B'xi +Y), (11.51) 


where A is a link function, the vector x; contains covariates for the ith firm, such as 
indicators for group membership or key balance sheet ratios, and B and jz are model 
parameters. Examples of link functions include the standard normal df ® (x) and the 
logistic df (1 + e~*)~!. The scale parameter ø has been subsumed in the normally 
distributed random variable ¥ ~ N(0, o°), representing a common or systematic 
factor. 

This model can be turned into a multi-period model for default counts in different 
periods by assuming that a series of mixing variables Y1, ..., W, generates default 
dependence in each time period t£ = 1,...,n. The default indicator Y, ; for the 
ith company in time period ft is assumed to be Bernoulli with default probability 
Pr i (P+) depending on Y, according to 


Pri) =h(wt+ x, B+ %), (11.52) 


where % ~ N(O, o?) and x;,; are covariates for the ith company in time period t. 
Moreover, the default indicators Y;,1,..., Y:,m, in period t are assumed to be con- 
ditionally independent given Yj. 

To complete the model we need to specify the joint distribution of W1, ..., Wn, 
and it is easiest to assume that these are iid mixing variables. To capture possible 
economic cycle effects causing dependence between numbers of defaults in succes- 
sive time periods, one could either enter covariates at the level of x;; that are known 
to be good proxies for “the state of the economy”, such as changes in GDP over 
the time period, or an index like the Chicago Fed National Activity Index (CFNAT) 
in the US, or one could consider a serially dependent time-series structure for the 
systematic factors (W%). 


A one-factor Poisson mixture model. When considering higher-grade portfolios 
of companies with relatively low default risk, there may sometimes be advantages 
(particularly in the stability of fitting procedures) in formulating Poisson mixture 
models instead of Bernoulli mixture models. A multi-period mixture model based 
on Definition 11.14 can be constructed by assuming that the default count vari- 
able Ya for the ith company in time period ¢ is conditionally Poisson with rate 
parameter A, ;(W,) depending on W, according to 


àr i (P1) = exp(u + x; B+ %), (11.53) 


with all other elements of the model as in (11.52). Again the variables A Ires ř tm, 
are assumed to be conditionally independent given Y. 
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GLMMs. Both the multi-period Bernoulli and Poisson mixture models in (11.52) 
and (11.53) belong to a family of widely used statistical models known as generalized 
linear mixed models (GLMMs). The three basic elements of such a model are as 
follows. 


(1) The vector of random effects. In our examples this is the vector (WY, ..., Yr) 
containing the systematic factors for each time period. 


(2) A distribution from the exponential family for the conditional distribution of 
the responses (Y; ,; or Y, t,i) given the random effects. Responses are assumed to 
be conditionally independent given the random effects. The Bernoulli, bino- 
mial and Poisson distributions all belong to the exponential family (see, for 
example, McCullagh and Nelder 1989, p. 28). 


(3) A link function relating E(Y;,; | W), the mean response conditional on the 
random effects, to the so-called linear predictor. In our examples the linear 
predictor for Y;; is 


nei (W) = w+ x; B+. (11.54) 


We have considered the so-called probit and logit link functions in the 
Bernoulli case and the log-link function in the Poisson case. (Note that it 
is usual in GLMMs to write the model as g(E(Y;,; | %)) = n:i (F+) and to 
refer to g as the link function; hence the probit link function is the quantile 
function of the standard normal, and the link in the Poisson case (11.53) is 
referred to as “log” rather than “exponential’.) 


When no random effects are modelled in a GLMM, the model is simply known as a 
generalized linear model, or GLM. The role of the random effects in the GLMM is, 
in a sense, to capture patterns of variability in the responses that cannot be explained 
by the observed covariates alone, but which might be explained by additional unob- 
served factors. In our case, these unobserved factors are bundled into a time-period 
effect that we loosely describe as the state of the economy in that time period; 
alternatively, we refer to it as the systematic risk. 

The GLMM framework allows models of much greater complexity. We can add 
further random effects to obtain multi-factor mixture models. For example, we might 
know the industry sector of each firm and wish to include random effects for sec- 
tors that are nested within the year effect; in this way we might capture additional 
variability associated with economic effects in different sectors over and above 
the global variability associated with the year effect. Such models can be con- 
sidered in the GLMM framework by allowing the linear predictor in (11.54) to 
take the form 7;,;(W%) = w+ Xp i Bo z, ivr for some vector of random effects 
V, = (Dii, Vi py; the vector Z;; is a known design element of the model that 
selects the random effects that are relevant to the response Y;,;. We would then have 
a total of p x n random effects in the model. We may or may not want to model 
serial dependence in the time series Wj, ..., Wn. 


472 11. Portfolio Credit Risk Management 


Inference forGLMMs. Full ML inference fora GLMM is an option for the simplest 
models. Consider the form of the likelihood for the one-factor models in (11.52) 
and (11.53). If we write py, ;|w,(y | Y) for the conditional probability mass function 


of the response Y; ; (or Y;,;) given Y%, we have, for data {Y;;: t = 1,...,n, i = 
1, ae) mi}, 
n mt 
L60 = ff (TT prt | YO JSO e WAY d 
t=1i=1 
(11.55) 


where f denotes the assumed joint density of the random effects. If we do not assume 
independent random effects from time period to time period, then we are faced 
with an n-dimensional integral (or an (n x p)-dimensional integral in multi-factor 
models). Assuming iid Gaussian random effects with marginal Gaussian density fy, 
the likelihood (11.55) becomes 


L(B. o; data) = [ [ ( i prain Yni | Va) fu On) avs), (11.56) 
t=1 =f 


i= 


so we have a product of one-dimensional integrals and this can be easily evalu- 
ated numerically and maximized over the unknown parameters. Alternatively, faster 
approximate likelihood methods, such as penalized quasi-likelihood (PQL) and 
marginal quasi-likelihood (MQL), can be used (see Notes and Comments). 

Another attractive possibility is to treat inference for these models from a Bayesian 
point of view and to use Markov chain Monte Carlo (MCMC) methods to make infer- 
ences about parameters (McNeil and Wendin 2006, 2007, see, for example,). The 
Bayesian approach has two main advantages. First, a Bayesian MCMC approach 
allows us to work with much more complex models than can be handled in the like- 
lihood framework, such as a model with serially dependent random effects. Second, 
the Bayesian approach is ideal for handling the considerable parameter uncertainty 
in portfolio credit risk, particularly in models for higher-rated counterparties, where 
default data are scarce. 


11.5.4 A One-Factor Model with Rating Effect 


In this section we fit a Bernoulli mixture model to annual default count data from 
Standard & Poor’s for the period 1981-2000; these data have been reconstructed 
from published default rates in Brand and Bahr (2001, Table 13, pp. 18-21). Stan- 
dard & Poor’s uses the ratings AAA, AA, A, BBB, BB, B, CCC, but because the 
observed one-year default rates for AAA-rated and AA-rated firms are mostly zero, 
we concentrate on the rating categories A—CCC. 

In our model we assume a single yearly random effect representing the state of 
the economy and treat the rating category as an observed covariate for each firm 
in each time period. Our model is a particular instance of the one-factor Bernoulli 
mixture model in (11.52) and a multi-period extension of the model described in 
Example 11.8. We assume for simplicity that random effects in each year are iid 
normal, which allows us to use the likelihood (11.56). 
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Since we are able to pool companies into groups by year and rating category, we 
note that it is possible to reformulate the model as a binomial mixture model. Let 
r = 1,...,5 index the five rating categories in our study, and write m;,, for the 
number of followed companies in year t with rating r, and Mz, for the number of 
these that default. Our model assumption is that, conditional on Y% (and the group 
sizes), the default counts M;,1,..., M;,5 are independent and are distributed in such 
a way that M; r | Pi = Y ~ B(m; r, p-(w)). Using the probit link, the conditional 
default probability of an r-rated company in year t is given by 


Pr (V) = D (ur + Y). (11.57) 


The model may be fitted under the assumption of iid random effects in each 
year by straightforward maximization of the likelihood in (11.56). The parameter 
estimates and obtained standard errors are given in Table 11.5, together with the 
estimated default probabilities 7 for each rating category and estimated default 
correlations rie implied by the parameter estimates. Writing W for a generic 


random effect variable, the default probability for rating category r is given by 


ES 
aO = E(w) = f P(fir +G2)b(z) dz, 1<r<5, 
[0.6] 


where ¢ is the standard normal density. The default correlation for two firms with 
ratings rı and r2 in the same year is calculated easily from the joint default probability 
for these two firms, which is 


CO 


A = BPW) Pr (W)) = J P (fir, + 62) P (Âr + 6)G(z) dz. 


—oo 
The default correlation is then 


SEM RAAD — RD 


Note that the default correlations are correlations between event indicators for very 


low probability events and are necessarily very small. 

The model in (11.57) assumes that the variance of the systematic factor W, is 
the same for all firms in all years. When compared with the very general Bernoulli 
mixture model (11.23) we might be concerned that the simple model considered in 
this section does not allow for enough heterogeneity in the variance of the systematic 
risk. A simple extension of the model is to allow the variance to be different for 
different rating categories: that is, to fit a model where p;(W%) = ® (u, +o,%) and 
where W, is a standard normally distributed random effect. This increases the number 
of parameters in the model by four but is no more difficult to fit than the basic model. 
The maximized value of the log-likelihood in the model with heterogeneous scaling 
is —2557.4, and the value in the model with homogeneous scaling is —2557.7; a 
likelihood ratio test suggests that no significant improvement results from allowing 
heterogeneous scaling. If rating is the only categorical variable, the simple model 
seems adequate, but if we had more information on the industrial and geographical 
sectors to which the companies belonged, it would be natural to introduce further 
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Table 11.5. Maximum likelihood parameter estimates and standard errors (se) for a 
one-factor Bernoulli mixture model fitted to historical Standard & Poor’s one-year default 
data, together with the implied estimates of default probabilities 7”) and default correlations 
ô; |"? . The MLE of the scaling parameter ø is 0.24 with standard error 0.05. Note that we 
have tabulated default correlation in absolute terms and not in percentage terms. 


Parameter A BBB BB B CCC 
Ur —3.43 —2.92 —2.40 —1.69 —0.84 
se (ur) 0.13 0.09 0.07 0.06 0.08 


a) 0.0004 0.0023 0.0097 0.0503 0.2078 


pv") 0.00040 0.00077 0.00130 0.00219 0.00304 A 
0.00077 0.00149 0.00255 0.00435 0.00615 BBB 
0.00130 0.00255 0.00440 0.00763 0.01081 BB 
0.00219 0.00435 0.00763 0.01328 0.01906 B 
0.00304 0.00615 0.01081 0.01906 0.02788 CCC 


random effects for these sectors and to allow more heterogeneity in the model in 
this way. 

The implied default probability and default correlation estimates in Table 11.5 can 
be a useful resource for calibrating simple credit models to homogeneous groups 
defined by rating. For example, to calibrate a Clayton copula to group BB we use 
the inputs 7°) = 0.0097 and poe = 0.0044 to determine the parameter 0 of the 
Clayton copula (see Example 11.13). Note also that we can now immediately use 
the scaling results of Section 11.3 to calculate approximate risk measures for large 
portfolios of companies that have been rated with the Standard & Poor’s system (see 


Example 11.20). 


Notes and Comments 


The main references for our account of industry factor models are Huang et al. 
(2012) and RiskMetrics Group (1997). 

The estimator (11.50) for joint default probabilities is also used in Lucas (1995) 
and Nagpal and Bahar (2001), although de Servigny and Renault (2002) suggest 
there may be problems with this estimator for groups with low default rates. A 
related moment-style estimator has been suggested by Gordy (2000) and appears to 
have a similar performance to (11.50) (see Frey and McNeil 2003). A further paper 
on default correlation estimation is Gordy and Heitfield (2002). 

A good overview article on generalized linear mixed models is Clayton (1996). 
For generalized linear models a standard reference is McCullagh and Nelder (1989) 
(see also Fahrmeir and Tutz 1994). 

The analysis of Section 11.5.4 is very similar to the analysis in Frey and McNeil 
(2003) (where heterogeneous variances for each rating category were assumed). The 
results reported in this book were obtained by full maximization of the likelihood 
using our own R code. Very similar results are obtained with the glmer function 
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in the 1me4 R package, which maximizes an adaptive Gauss—Hermite approxima- 
tion to the log-likelihood. GLMMS may also be estimated using the approximate 
penalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL) methods 
(see Breslow and Clayton 1993). For a Bayesian approach to fitting the model using 
Markov chain Monte Carlo techniques, see McNeil and Wendin (2007), who also 
incorporate an autoregressive time-series structure for the random effects. 

Although we have only described default models it is also possible to analyse 
rating migrations in the generalized linear model framework (with or without ran- 
dom effects). A standard model is the ordered probit model, which is used without 
random effects in Nickell, Perraudin and Varotto (2000) to provide evidence of time 
variation in default rates attributable to macroeconomic factors; a similar message is 
found in Bangia et al. (2002). McNeil and Wendin (2006) show how random effects 
and unobserved factors may be included in such models and carry out Bayesian 
inference. See also Gagliardini and Gouriéroux (2005), in which a variety of rating- 
migration models with serially dependent unobserved factors are studied. 

There is a large literature on models with latent structure designed to capture 
the dynamics of systematic risk, and there is quite a lot of variation in the types 
of model considered. Crowder, Davis and Giampieri (2005) use a two-state hidden 
Markov structure to capture periods of high and low default risk, Koopman, Lucas 
and Klaassen (2005) use an unobserved components time-series model to describe 
US company failure rates, Koopman, Lucas and Monteiro (2008) develop a latent 
factor intensity model for rating transitions, and Koopman, Lucas and Schwaab 
(2012) combine macroeconomic factors, unobserved frailties and industry effects 
in a model of US defaults through the crisis of 2008. 


12 


Portfolio Credit Derivatives 


In this chapter we study portfolio credit derivatives such as collateralized debt obli- 
gations (CDOs) and related products. The primary use of portfolio credit derivatives 
is in the securitization of credit risk: that is, the transformation of credit risk into 
securities that may be bought and sold by investors. The market for portfolio credit 
derivatives peaked in the period leading up to the 2007-9 credit crisis (as discussed 
in Section 1.2.1) and has only partly recovered since. 

In Section 12.1 we describe the most important portfolio credit derivatives and 
their properties. We also provide some more discussion of the role that these prod- 
ucts played in the credit crisis. Section 12.2 introduces copula models for portfolio 
credit risk, which have become the market standard in pricing CDOs and related 
credit derivatives. In Section 12.3 we discuss pricing and model calibration in factor 
copula models. More advanced dynamic portfolio credit risk models are studied in 
Chapter 17. 

This chapter makes extensive use of the analysis of single-name credit risk models 
in Sections 10.1—10.4 and of basic notions in copula theory. We restrict our attention 
to models for random default times with deterministic hazard functions without 
adding the extra complexity of doubly stochastic default times as in Sections 10.5 
and 10.6. The simpler models are sufficient to understand the key features of portfolio 
credit derivative pricing. 


12.1 Credit Portfolio Products 


In this section we describe the pay-off and qualitative properties of certain important 
credit portfolio products such as CDOs. We begin by introducing the necessary 
notation. 

We consider a portfolio of m firms with default times t1, ..., Tm. In keeping 
with the notation introduced in the earlier credit chapters, the random vector Y; = 
(Yil, ---, Vim)’ with Y, ; = Iz,<1} describes the default state of the portfolio at 
some point in time t > 0. Note that Y, ; = 1 if firm i has defaulted by time t, and 
Y, i = 0 otherwise. We assume throughout that there are no simultaneous defaults, 
so we may define the ordered default times To < Ti <--- < Tm by setting To = 0 
and recursively setting 


Ta = min{ti, ..., Tm: Tti > Tri}, Len<m. 
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Figure 12.1. Schematic representation of the payments in a CDO structure. 


Asin Chapter 11, the exposure to firm (or so-called reference entity) i is denoted by e; 
and the percentage loss given default (LGD) of firm i is denoted by ô; € [0, 1]. The 
cumulative loss of the portfolio up to time t is therefore given by L; = 7, ôiei Yri. 
While ô; and e; may in principle be random, we mostly work with deterministic 
exposures and LGDs; further assumptions about these quantities are introduced as 
and when needed. 


12.1.1 Collateralized Debt Obligations 


Before the credit crisis of 2007-9 CDO markets were a fast-growing segment of the 
credit market. Although activity on CDO markets has slowed down since the crisis, 
CDOs and credit products with a similar structure remain an important asset class 
for risk managers to study. 

A CDO is a financial instrument for the securitization of a portfolio of credit 
products such as bonds, loans or mortgages. This portfolio forms the so-called 
asset pool underlying the contract. The CDOs that are traded in practice come in 
many different varieties, but the basic structure is the same. The assets are sold to 
a special-purpose vehicle (SPV): a company that has been set up with the single 
purpose of carrying out the securitization deal. To finance the acquisition of the 
assets, the SPV issues securities in tranches of differing seniority, which form the 
liability side of the structure. The tranches of the liability side are called (in order 
of increasing seniority) equity, mezzanine and senior tranches (sometimes there 
are also super-senior tranches). The rules that determine the exact cash flow of the 
tranches are known as the waterfall structure of the CDO. These rules can be quite 
complex. Roughly speaking, the waterfall structure ensures that losses due to credit 
events on the asset side are borne first by the equity tranche; if the equity tranche 
is exhausted, losses are borne by the mezzanine tranches and only thereafter by the 
senior tranches. The credit quality of the more senior tranches is therefore usually 
higher than the average credit quality of the asset pool. The payments associated 
with a typical CDO are depicted schematically in Figure 12.1. 

CDOs where the asset pool consists mainly of bonds are known as collateralized 
bond obligations (CBOs); if the asset side consists mainly of loans, a CDO is termed 
a collateralized loan obligation (CLO); CDOs for the securitization of mortgages 
are also known as mortgage-backed securities (MBSs) or asset-backed securities 
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(ABSs). There is also a liquid market for synthetic CDO tranches. In these contracts 
payments are triggered by default events in a pool of reference names (typically 
major corporations), but there are no actual assets underlying the contract. A precise 
pay-off description for synthetic CDO tranches is given in Section 12.1.2. 

There are a number of economic motivations for arranging a CDO transaction. 
To begin with, in a typical CDO structure a large part of the asset pool is allocated 
to the senior tranches with a fairly high credit quality, even if the quality of the 
underlying assets is substantially lower. For instance, according to Hull and White 
(2010), for a typical ABS created from residential mortgages, about 75-80% of the 
underlying mortgage principal is allocated to senior tranches with a AAA rating. 
Many institutional investors prefer an investment in highly rated securities because 
of legal or institutional constraints. Securitization can therefore be a way to sell a 
large part of the underlying assets to investors who are unable to invest directly in 
the asset pool. Another incentive to set up a CDO transaction is related to capital 
adequacy; CDOs are often issued by banks who want to sell some of the credit- 
risky securities on their balance sheet in order to reduce their regulatory capital 
requirements. 

Securitization via CDOs or ABSs is an important tool in credit markets. It allows 
lenders to reduce concentration risk and leverage and to refinance themselves more 
efficiently. Securitization can therefore increase the lending capacity of the financial 
sector. On the other hand, the credit crisis of 2007-9 clearly exposed a number of 
problems related to securitization and the use of asset-backed CDOs. 


e Securitization can create incentive problems: if a mortgage originator knows 
that most of the mortgages he sells to homeowners will be securitized later 
on, he has little interest in evaluating the credit quality of the borrowers care- 
fully. This can lead to a deterioration of lending standards. There is a lot 
of evidence that this actually happened in the years preceding the subprime 
crisis (see, for example, Crouhy, Jarrow and Turnbull 2008). This problem 
could be addressed by better aligning the interests of loan originators and 
of ABS investors. For instance, originators could be forced to keep a certain 
percentage of all the tranches they sell on the securitization market. 


The exact cash-flow structure of most asset-backed CDOs is very complicated. 
In fact, the legal documentation defining the payments of an asset-backed 
CDO can run to several hundred pages. This makes it difficult for investors 
to form an opinion of the value and the riskiness of any given tranche, thus 
contributing to the low trading volume in securitization markets during the 
credit crisis. At the height of the subprime crisis CDO products were on offer 
that crossed ethical and even legal boundaries. An example is the infamous 
ABACUS 2700-AC-1 CDO of Goldman Sachs (see Duffie (2010) for details). 


CDOs and ABSs were clearly misused by banks before the credit crisis to 
exploit regulatory arbitrage. They allowed loan-related credit risk to be trans- 
ferred from the banking book to the trading book, where it enjoyed a more 
lenient capital treatment. By holding tranches that had been overoptimistically 
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rated as AAA in the trading book, banks were able to substantially lower their 
regulatory capital. 


e The pay-off distribution associated with CDO tranches is very sensitive to 
the characteristics of the underlying asset pool, which makes it difficult to 
properly assess the risk associated with these instruments. In particular, a 
small increase in default correlation often leads to a substantial increase in 
the likelihood of losses for senior tranches. An intuitive explanation of this 
correlation sensitivity is given below. 


Stylized CDOs and correlation sensitivity. In order to gain a better understand- 
ing of the main qualitative features of CDOs without getting bogged down in 
the details of the waterfall structure, we introduce a hypothetical contract that 
we label a stylized CDO. We consider a portfolio of m firms with cumulative 
loss L; = yey ôieiYr i and deterministic exposures. The stylized CDO has k 
tranches, indexed by x e {1,...,k} and characterized by attachment points 
0= Ko < Ki <.< Ka < Ji ei. The value of the notional corresponding to 
tranche « can be described as follows. Initially, the notional is equal to K, — Kx—1; 
it is reduced whenever there is a default event such that the cumulative loss falls 
in the layer [K,,_1, K,,]. In mathematical terms, N, «, the notional of tranche « at 
time ¢, is given by 


K, — Kę-1 forl < Kẹ], 
Ni = Ne(Lt1) with Ne) = 4 Ke — 1l for] € [Kx-1, Kx], (12.1) 
0 forl > Kx. 


Note that N; (D) = (Ky — D — (Kx-1 —1)*, so N, « is equal to the sum of a long 
position in a put option on L; with strike price K, and a short position in a put on 
L, with strike price K,—1. Such positions are also known as put spreads. 

We assume that in a stylized CDO the pay-off of tranche « is equal to Nr,«, the 
value of the tranche notional at the maturity date T. In Figure 12.2 we have graphed 
the pay-off for a stylized CDO with maturity T = 5 years and three tranches 
(equity, mezzanine, senior) on a homogeneous portfolio of m = 1000 firms, each 
with exposure one unit and loss given default 5; = 0.5. The attachment points are 
Kı = 20, K2 = 40, K3 = 60, corresponding to 2%, 4% and 6% of the overall 
exposure; tranches with higher attachment points are ignored. We have plotted two 
distributions for Lr: first, a loss distribution corresponding to a five-year default 
probability of 5% and a five-year default correlation of 2%; second, a loss distribution 
with a five-year default probability of 5% but with independent defaults. In both cases 
the expected loss is given by E (Lr) = 25. Figure 12.2 illustrates how the value of 
different CDO tranches depends on the extent of the dependence between default 
events. 


e For independent defaults, Lr is typically close to its mean due to diversifi- 
cation effects within the portfolio. It is therefore quite unlikely that a tranche 
k with lower attachment point K,—1 substantially larger than E (Lr) (such 
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Figure 12.2. Pay-off of a stylized CDO contract and distribution of the five-year loss L5 for 
a five-year default probability of 5% and different default correlations. Detailed explanations 
are given in the text. 


as the senior tranche in Figure 12.2) suffers a loss, so the value of such a 
tranche is quite high. On the other hand, since the upper attachment point Kı 
of the equity tranche is lower than E(L7) = 25, it is quite unlikely that Lr 
is substantially smaller than K,, and the value of the equity tranche is low. 


e If defaults are dependent, diversification effects in the portfolio are less pro- 
nounced. Realizations of the loss Lry larger than the lower attachment point K2 
of the senior tranche are more likely, as are realizations of Lr smaller than 
the upper attachment point K, of the equity tranche. This reduces the value 
of tranches with high seniority and increases the value of the equity tranche 
compared with the case of independent defaults. 


e The impact of changing default correlations on mezzanine tranches is unclear 
and cannot be predicted up front. 


The relationship between default dependence and the value of CDO tranches carries 
over to the more complex structures that are actually traded, so dependence mod- 
elling is a key issue in any model for pricing CDO tranches (see also Section 12.3.2 
below). 


Pricing and the role of rating agencies. Before the financial crisis, rating agen- 
cies played a dominant role in the valuation of asset-backed CDOs. In fact, many 
CDO investors lacked the necessary sophistication and data to form an independent 
judgement of the riskiness of asset-backed CDOs, a problem that was compounded 
by the complex waterfall structure of most CDO issues. They therefore based their 
investment decision solely on the risk assessment of the rating agencies. This is par- 
ticularly true for AAA-rated tranches, which appeared to be attractive investment 


12.1. Credit Portfolio Products 481 


opportunities due to the relatively high offered yield (compared with the yield that 
could be earned on a standard AAA-rated bond, for example). 

In relying on ratings, investors implicitly assumed that a high-quality rating such 
as AAA for a CDO tranche meant that the tranche had a similar risk profile to a 
AAA-rated bond. This perception is clearly wrong; since the loss distribution of an 
asset-backed CDO tranche is extremely sensitive with respect to the credit quality 
and the default correlation of the mortgages in the underlying asset pool, ratings for 
CDO tranches change rapidly with changes in these parameters. Moreover, while 
rating agencies have a lot of experience in rating corporate and sovereign debt, their 
experience with CDOs and other structured credit products was quite limited. As a 
result, ratings for CDO tranches turned out to be very unstable. In fact, at the onset 
of the crisis the rating of a large proportion of the traded ABS—CDOs (CDOs where 
the underlying asset pool consists of mortgage-based ABSs) was downgraded from 
investment grade to speculative grade, including default, within a very short period 
of time (Crouhy, Jarrow and Turnbull 2008). This massive rating change has sparked 
an intense debate about the appropriateness of rating methodologies, and about the 
role and the incentives of rating agencies more generally. We refer to Notes and 
Comments for further reading. 


CDO-squared contracts. CDO-squared contracts are CDOs where the underlying 
asset pool itself consists of CDO tranches. These products are very complex and 
difficult, if not impossible, to value. For this reason they never became particularly 
popular on markets for synthetic CDOs. The situation is different in markets for 
asset-backed CDOs. Before the crisis there was intense trading activity in ABS- 
CDOs. The main reason for this was the fact that these products seemed to offer a 
way to create additional AAA-rated securities from the mezzanine tranches of the 
original ABSs, thus satisfying the high demand for AAA-rated securities. Investors 
in highly rated ABS—CDOs incurred severe losses during the credit crisis and, as 
shown by Hull and White (2010), the AAA rating carried by many of these structures 
was very hard to defend in retrospect. Many studies since the financial crisis have 
emphasized the need for simpler and more standardized financial products: see, for 
example, Crouhy, Jarrow and Turnbull (2008) and Hull (2009). ABS—CDOs are 
clearly a prime case in point. 


12.1.2 Credit Indices and Index Derivatives 


Credit index derivatives are standardized credit products whose pay-off is deter- 
mined by the occurrence of credit events in a fixed pool of major firms that form the 
so-called credit index. A key requirement for the inclusion of a firm in a credit index 
is the existence of a liquid single-name CDS market in that firm. The availability of 
indices has helped to create a liquid market for certain credit index derivatives that 
has become a useful benchmark for model calibration and an important reference 
point for academic studies. 

At present there are two major families of credit indices: the CDX family and 
the iTraxx family. CDX indices refer to American companies and iTraxx indices 
refer either to European firms or to Asian and Australian firms. Characteristics of 
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Table 12.1. Composition of main credit indices (taken from Couderc and Finger (2010)). 
The most important indices are the iTraxx Europe and CDX.NA.IG indices. 


Name Pool size Region Credit quality 
CDX.NA.IG 125 North America Investment grade 
CDX.NA.IG.HVOL 30 North America Low-quality investment grade 
CDX.NA.HY 100 North America Non-investment-grade 
iTraxx Europe 125 Europe Investment grade 
iTraxx Europe 30 Europe Low-quality investment grade 


the main credit indices are given in Table 12.1. In order to reflect changes in the 
credit quality of the constituents, the composition of most credit indices changes 
every six months at the so-called roll dates (20 March and 20 September), and the 
pools corresponding to the roll dates are known as the different series of the index. 
Products on older series continue to trade but the market for products related to the 
current series is by far the most liquid. 

Standardized index derivatives are credit index swaps and single-tranche CDOs 
with a standardized set of attachment points. The cash flow of these products bears 
some similarities to the cash flows of a single-name CDS as described in Sec- 
tions 10.1.4 and 10.4.4. Each contract consists of a premium payment leg (pay- 
ments made by the protection buyer) and a default payment leg (payments made 
by the protection seller). Premium payments are due at deterministic time points 
0 <t <- < ty = T, where T is the maturity of the contract. Standardized 
index derivatives have quarterly premium payments, i.e. tn — t,-1 = 0.25; the time 
to maturity at issuance is three, five, seven or ten years, with five-year products 
being the most liquid. 

Next we describe the pay-offs of an index swap and a CDO tranche. We consider 
a fixed pool of m names (m = 125 for derivatives related to the iTraxx Europe and 
CDX.NA.IG indices) and we normalize the exposure of each firm to 1, so that the 
cumulative portfolio loss at time tf equals L; = oe 1 ôi Yri. 


Credit index swaps. Ata default time 7; < T there is a default payment of size ¢,, 
where & € {1, ..., m} is the identity of the name defaulting at Tp. The cumulative 
cash flows of the default payment leg up to time t < T (received by the protection 
buyer) are therefore given by 


m 
ss bg, = a ôi = So iYni = [iy 
Tk Xt Tt i=l 


Given an annualized swap spread x, the premium payment at time t, (received by 
the protection seller) is given by 


Ind 
X(t — tn-1)N,." ; 


where the notional N}nd of the index swap is equal to the number of surviving firms 
at time t: that is, NP4 = m — )~"", Y; i. This definition of the notional reflects the 
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Table 12.2. Standardized attachment points for single-tranche 
CDOs on the CDX.NA.IG and iTraxx Europe indices. 


CDX.NA.IG 0-3% 3-7% 7-10% 10-15% 15-30% 
iTraxx Europe 0-3% 3-6% 6-9% 9-12% 12-20% 


fact that at a default time the defaulting entity is removed from the index. Moreover, 
at a default time Tk € (tn—1, tn], the protection buyer pays the protection seller the 
part of the premium that has accrued since the last regular premium payment date: 
that is, the quantity x (Tk — tn—1). A credit event therefore has a double effect on the 
cash-flow structure of the index swap: it leads to a default payment and it reduces 
future premium payments. 


Single-tranche CDOs. A single-tranche CDO on the reference portfolio is char- 
acterized by fixed lower and upper attachment points O < l < u < 1, expressed as 
percentages of the overall notional m of the index pool. As in (12.1) we define the 


notional N6 of the tranche by a put spread: 


NE) = (um — L,)+ — (Im — L,)t. (12.2) 


In particular, for Lo = 0 the initial notional of the tranche is equal to m (u — 1). The 
cumulative tranche loss up to time t is then given by 


LY) = mw — 1) — NY) = (L, —Im)* — (L, — um)”, (12.3) 


so the tranche loss can be viewed as a call spread on the cumulative portfolio loss. 
At a default time Tọ < T the protection seller makes a default payment of size 


ALẸ” = LẸ" — LẸ”. (12.4) 
Again the premium payment leg consists of regular and accrued premium payments. 
Given an annualized tranche spread x, the regular premium payment at date tn 
is given by x (tn — tn— DaD”, The accrued payment at a default time Tk € (tn—1, tn] 
equals x (Tk — tn—1) ALẸ", In order to simplify the exposition we will usually omit 
accrued premium payments below. 

Single-tranche CDOs are sometimes called synthetic, as there is no physical trans- 
fer of credit-risky securities from the protection seller to the protection buyer, in 
contrast to asset-backed CDOs. There is a standardized set of attachment points for 
index tranches on the iTraxx Europe and CDX.NA.IG indices (see Table 12.2). In 
analogy with the terminology used for asset-backed CDOs, the tranche with lower 
attachment point / = 0 is known as the equity tranche; the equity tranche is clearly 
affected by the first losses in the underlying pool. The tranche with the highest 
attachment point is termed the senior tranche and the other tranches are known 
as mezzanine tranches of differing seniority. Tranches with non-standard maturity 
dates or attachment points and tranches on portfolios other than the constituents of 
a popular credit index are known as bespoke CDO tranches. 
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12.1.3 Basic Pricing Relationships for Index Swaps and CDOs 


In this section we discuss some elementary model-independent pricing relationships 
for index swaps and CDOs that will be useful in this chapter. In our analysis we do 
not specify the underlying portfolio credit risk model in any detail; rather we take 
as given the joint distribution of the default times and of the LGDs calculated under 
some risk-neutral pricing measure Q. For simplicity we set the valuation date equal 
tot = 0. Moreover, we assume that the default-free interest rate r (t) is deterministic, 
and we denote by 


t 
po(0, t) = exp (- / r(s) as), t >0, 
0 


the default-free discount factors or zero-coupon bond prices as seen from time t = 0. 
Deterministic interest rates are assumed in most of the literature on portfolio credit 
risk models, essentially because the additional complexity of stochastic interest rates 
is not warranted given the large amount of uncertainty surrounding the modelling 
of default dependence. 


CDS index swaps. The market value VP of the default payments of a CDS index 
swap is given by the Q-expectation of the associated discounted cash-flow stream. 
The latter is given by È n<T po(0, T) ALr,, where ALT, = Ly, — Ly = bg. 

Since this sum may be written more succinctly as H po(0, t) dL; we get 


yrf — eof po(0, t) al) = Seeff’ po(0, t) dks), (12.5) 
i=l 


where Lsi = 6;Y;,; is the cumulative loss process of firm i. 
Given a generic spread x, the market value of the premium payments is given by 
vPrem (x), where 
N 
vPrem(x) = x D> po, tn) tn — tn-1)E2 (Ny) 


n=1 


m N 
=x X YO pPoO, tn)(tn — tn- DEL — Yai) (12.6) 
i=l n=1 
Clearly, VPe™ (x) = x VP™ (1), 

The market value at t = 0 of a protection buyer position in an index swap with 
given spread x is thus given by VP®f — vPrem(x), As in the case of single-name 
CDSs, the fair index swap spread x™ of the contract at a given point in time is set 
such that the market value of the contract at that date is 0. This leads to the formula 
y Def 


= yPrem(1) i 


Ind 


(12.7) 

Next we consider the relationship between the index swap spread x!™4 and the 
fair CDS spread x! for the single-name CDSs on the constituents of the index. It is 
tempting to conclude that the index swap spread is simply the arithmetic average of 
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the x’, but this is wrong in general as the following analysis shows. For a single- 
name CDS on firm i with identical maturity and premium payment dates as the 
index swap, one has y pet = yPreni (x') with 


T 
yee < zef f po(0, Ddr), 
0 


N 
VP x) = x X polt, ta) (tn — tr) E2( — Yai), x> 0. 
n=1 
Comparing these relations with (12.5) and (12.6), we see that VP = yo", yDef 
and VIF (x) = 7 yen (x). For the fair index spread we therefore obtain 
Be y Det ae r y Det y x! Ved) m 


= = = = = wx', (12.8) 
Vee h vya Dh a) dm 


with weights given by w; := V™™(1)/(97, VPr™(1)). The index spread is 
indeed therefore a weighted average of the single-name CDS spreads, but the weights 
are in general not equal to | /m. In fact, if firm i is of high credit quality and firm j 
of relatively low credit quality, one has 


E20 — Yii) = Q(t >t) > Ot; >) =E21-Y;,;), t>0. 


This implies that VP"™™(1) > vprem(1) and hence that w; > wj, so that high- 
quality firms have a larger weight than low-quality firms. Of course, in the special 
case where all t; have the same distribution we get wy = --- = Wm = l/m. An 
example is given by the simple model in which the default times are exponentially 
distributed with identical hazard rate y 2 so that Q(ti >t)= e7?! , and in which 


the LGD is deterministic and identical across firms. In that case we have x! = 


. = x” = x4, Moreover, the parameter y2 can be calibrated from a market- 
observed index spread x* using the same procedure as in the case of single-name 
CDS spreads (see Section 10.4.4). This setup is frequently employed by practitioners 


in the computation of implied correlations for single-tranche CDOs. 


Single-tranche CDOs. Finally, we provide a more explicit description for the value 
of a single-tranche CDO. We begin with the premium payments. According to the 
definition of the tranche notional in (12.2), the market value of the regular premium 
payments for a generic CDO spread x is given by VP"®™ (x), where 


N 
vPrem (x) = x $ po(0, tn) (tn — tn—1)E2((um — Ly,)* — Um — L,,)*). (12.9) 


n=1 


Concerning the default payments we note from (12.4) that the discounted cash-flow 
stream of the default payment leg is given by 


T 
VE = > pO TOALE" =} po(0, t) dL", (12.10) 
0 
TkT 
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The integral on the right can be approximated by a Riemann sum, using the premium 
payment dates as gridpoints: 


T N 
T po0, dL}! ~ È poO, wag — Ly"). (12.11) 
0 


n=1 
In economic terms this approximation means that losses occurring during the period 
[tn—1, tn] are paid only at time t, (and are therefore discounted with a slightly higher 
factor). Since in practice premium payments typically occur quarterly, the error 
of this approximation is negligible. Recall that pie is a function of L;, namely 
LEA = ybl (L) := (L; — Im)t — (L; — um)*. Hence, with a slight abuse of 
notation we have 
N 
VPE = X > po, n) (EC (vl (L,,)) — ES OL). (12.12) 


n=1 


Summarizing, we find that the evaluation of (12.12) and (12.9), and hence the deter- 
mination of CDO spreads, reduces to computing call or put option prices on the 
cumulative loss process at the premium payment dates t),..., ty. 


Notes and Comments 


There are many contributions that discuss the pros and cons of securitization in the 
light of the subprime credit crisis of 2007 and 2008. Excellent descriptions of the 
events surrounding the crisis—including discussions of steps that should be imple- 
mented to prevent a repeat of it—are given by Crouhy, Jarrow and Turnbull (2008), 
Hull (2009) and, in an insurance context, Donnelly and Embrechts (2010) (see also 
Das, Embrechts and Fasen 2013). Hull and White (2010) test the ratings given to 
ABSs and ABS—CDOs before the crisis. They find that, whereas the AAA ratings 
assigned to ABSs were not unreasonable, the AAA ratings assigned to tranches of 
CDOs created from mezzanine tranches of ABSs cannot be justified by any proper 
quantitative analysis. 

In his analysis of the credit crisis, Brunnermeier (2009) is particularly concerned 
with the various transmission mechanisms that caused losses in the relatively small 
American subprime mortgages market to be amplified in such a way that they created 
a global financial crisis. An interesting discussion of securitization in the light of the 
subprime crisis from a regulatory viewpoint is the well-known Turner Review (Lord 
Turner 2009). Incentive problems in the securitization of mortgages are discussed in 
Franke and Kahnen (2009). A more technical analysis of the value of securitization as 
a risk-management tool can be found in Frey and Seydel (2010). From the multitude 
of books we single out Dewatripont, Rochet and Tirole (2010) and Shin (2010). We 
also suggest that the interested reader look at the various documents published 
by the Bank for International Settlements and the Basel Committee on Banking 
Supervision (see www.bis.org/bcbs). 

For alternative textbook treatments of CDOs and related index derivatives we 
refer to Bluhm and Overbeck (2007) and Brigo, Pallavicini and Torresetti (2010). 
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Both texts contain a wealth of institutional details on CDO markets. The book by 
Bluhm and Overbeck (2007) also discusses so-called basket default swaps, or, more 
technically, kth-to-default swaps. These products offer protection against the kth 
default in a portfolio with m > k obligors (the basket). As in the case of an ordinary 
CDS, the premium payments on a kth-to-default swap take the form of a periodic 
payment stream, which stops at the kth default time Tg. The default payment is 
triggered if Tg is smaller than the maturity date of the swap. While first-to-default 
swaps are fairly common, higher-order default swaps are encountered only rarely. 


12.2 Copula Models 


Copula models are widely used in practice for the pricing of CDO tranches and 
basket credit derivatives. In this section we discuss this important class of models 
with a particular focus on models where the copula has a factor structure. 


12.2.1 Definition and Properties 


Definition 12.1 (copula model for default times). Let C be a copula and let y; (t), 
1 <i < m, be nonnegative functions such that T; (t) := i yi(s)ds < œ for all 
t > 0 and lim; Ij (t) = œ. The default times T1, ..., Tm then follow a copula 
model with survival copula C and marginal hazard functions y;(t) if their joint 
survival function can be written in the form 


F(t, tn) = C(e 1 ™, 2. eT in), (12.13) 


The marginal survival functions in a copula model for defaults are obviously given 
by F; (t) = e~/' , The marginal survival functions and marginal dfs are continuous, 
and it follows from Sklar’s Theorem that both the copula and the survival copula 
C of the vector of default times T := (T1, .. . , Tm)’ are unique. Of course, the joint 
distribution of the default times could also be described in terms of the copula of t 
and the marginal distribution functions F),..., Fm. We focus on survival copulas 
as this ties in with a large part of the literature. If C in (12.13) is radially symmetric 
(see Definition 7.15), then C is also the copula of t. This is true, in particular, if C 
is an elliptical copula, such as the Gauss copula or the t copula. 

From a mathematical point of view it makes no difference whether we specify the 
survival copula and the marginal hazard functions separately or whether we specify 
the joint survival function F directly; every joint survival function F with absolutely 
continuous marginal distributions has a unique representation of the form (12.13). 
However, the representation (12.13) is convenient for the calibration of the model 
to prices of traded credit derivatives. The usual calibration process is carried out in 
two steps, as we now explain. 

In the first step, the marginal hazard functions are calibrated to a given term struc- 
ture of CDS spreads or spreads of defaultable bonds, as described in Section 10.4.4. 
In the second step, the parameters of the survival copula C are calibrated to the 
observed prices of traded portfolio credit derivatives, most notably CDO index 
tranches. The two-stage calibration is feasible because the second step has no effect 
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on the parameters calibrated in the first step. This is very advantageous from a numer- 
ical point of view, which is one of the reasons for the popularity of copula models 
among practitioners. We return to the calibration of copula models in Section 12.3. 

In order to link our analysis to the discussion of copulas in threshold models in 
Section 11.1, we take a brief look at the one-period portfolio model that is implied 
by a copula model for the default times T1, ..., Tm. We fix a horizon T and set 
Y; := Yr i. By definition, Y; = 1 if and only if t; < T, so the one-period model has 
the threshold representation (t, (T,..., TY’). The dependence of the default events 
in the one-period model is governed by the dependence structure of the critical 
variables T1, ..., Tm. Note that in Section 11.1 this dependence was described in 
terms of the copula of the critical variables, whereas in this section we prefer to 
work with the survival copula of the default times. 

Definition 12.1 immediately leads to an explicit construction of the random times 
T1,---, Tm Via random thresholds; this in turn yields a generic simulation method 
for copula models. Consider a random vector U ~ C and define the default times 
by 

ti = inf{t > 0: eO < Ui}, 1<i<m, (12.14) 


so that firm i defaults at the first time point where the marginal survival function 
F; (t) crosses the random threshold U;. Equation (12.14) yields 


P(t > ti,..., Tm > tm) = P(U1 <S en Um < e7 Tn (in) 
= Cie, ee: etm) 


as required. To generate a realization of t we generate a realization of the ran- 
dom vector U and construct the t; according to (12.14). An alternative simulation 
algorithm for factor copula models is given at the end of this section. 


Factor copula models. Most copula models used in practice have a factor structure. 
By this we mean that the threshold vector U ~ C used in (12.14) has a conditional 
independence structure in the sense of Definition 11.9, i.e. there is a p-dimensional 
random vector V, p < m, such that the U; are independent given V. 

The conditional independence of the U; allows for an alternative representation 
of the joint survival function of the default times in terms of a mixture model. 
By conditioning on V, it follows from (12.14) that we can write the joint survival 
function of T as 


F(ti,..., tm) = E(P(U < F(t), ..., Um < Fi Ags) | V)) 
=e([] Pu < Aa |v) (12.15) 
i=l 


Moreover, by the definition of t; in (12.14) we have P(t; > t | V) = P(U; < 
F; (t) | V). The conditional survival function of t; given V = v therefore satisfies 


Fyiv(t | v) = PU; < Fi(t)| V =), 


12.2. Copula Models 489 


and we may write F(t, E A Ey, Fav (ti | V)). Denoting the density 
or probability mass function of V by gy, we will usually write the joint survival 
function in the form 


Poom f [| Fava: | v)gy (v) dv. (12.16) 
P i=] 


Note that the representation (12.16) is analogous to the representation of one-period 
threshold models with conditional independence structure as Bernoulli mixture mod- 
els (see Section 11.2.4). In particular, (12.16) shows that for T fixed the default 
indicators follow a Bernoulli mixture model with factor vector V and conditional 
default probabilities p;(v) = 1 — Fav (t | v). 

The mixture-model representation of a factor copula model gives rise to the fol- 
lowing generic algorithm for the simulation of default times. 


Algorithm 12.2 (sampling from factor copula models). 


(1) Generate a realization of V. 

(2) Generate independent rvs t; with df 1 — Fay (t |V)1l<i<m. 

The importance-sampling techniques discussed in Section 11.4 in the context of 
one-period Bernoulli mixture models can be applied to improve the performance of 
Algorithm 12.2. These techniques are particularly useful if one is interested in rare 
events, such as in the pricing of CDO tranches with high attachment points. 


12.2.2 Examples 


In this section specific examples of factor copula models will be discussed. Any 
random vector with continuous marginal distributions and a p-dimensional condi- 
tional independence structure (see Definition 11.9) can be used to construct a factor 
copula model. Important examples in practice include factor copulas models based 
on the Gauss copula C oa the LT-Archimedean copulas discussed in Section 7.4.2, 
and general one-factor copulas including the so-called double-t and double-NIG 
copulas. 


Example 12.3 (one-factor Gauss copula). Factor copula models based on a Gauss 
copula C pa are frequently used in practice. We will compute the joint survival 
function for the default times in the one-factor case. Let 


Xi = /piV+/1— pis, (12.17) 


where p; € (0, 1) and where V and (¢;)1<j<m are iid standard normal rvs. The ran- 
dom vector X = (X,..., Xm} satisfies X ~ N,,(0, P), where the (i, j)th element 
of P is given by pij = J Pi Pj. We set U; = ®(X;) so that U = (U1, ..., Um)’ ~ 
C pa, Both X and U have a one-factor conditional independence structure. 

The conditional survival function of t; is easy to compute. Writing dj(t) := 
p7! (F; (t)), we have that 


Fav | v) = PU; < RO | V =v) = (« < ee | V= v), 
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leading to Fav (t | v) = ((di (t) — /piv)/(/1 — pi)). Hence 


z 1 = di (ti) — J pi = 
Fest l= = m 2) v7 /2 dy, (12.18) 
i=l $ 


which is easily computed using one-dimensional numerical integration. 

In applications of a one-factor Gauss copula model to the pricing of portfolio 
credit derivatives it is frequently assumed that p; = p for all i (the equicorrelation 
case). This model is known as the exchangeable Gauss copula model. In this case 
p = corr(X;, Xj), so p is readily interpreted in terms of asset correlation. This 
makes the exchangeable Gauss copula model popular with practitioners. In fact, it 
is common practice on CDO markets to quote prices for tranches of synthetic CDOs 
in terms of implied asset correlations computed in an exchangeable Gauss copula 
model, as will be discussed in detail in Section 12.3.2. 


Example 12.4 (general one-factor copula). This class of examples generalizes the 
construction of the one-factor Gauss copula in Example 12.3; some of the resulting 
models are useful in explaining prices of CDO tranches on credit indices. As in 
(12.17), one starts with random variables X; = V/V + /1 = pisi for p; € [0, 1] 
and independent rvs V, £1, ..., Em, but now V and the £; can have arbitrary con- 
tinuous distributions (not necessary normal). The distribution Fy, is then given by 
the convolution of the distributions of ./p; V and of ./1 — pje;. The corresponding 
copula is the df of the random vector U with U; = Fy, (X;). A similar calculation 
to that in the case of the one-factor Gauss copula gives the following result for the 
conditional survival function: 


Fy) (Fi) - ee 
vl- fi l 

Popular examples include the so-called double-t copula (the case where V and ¢; 
follow a univariate t, distribution) and the double-GH copula (the case where V 
and s; both follow a univariate GH distribution as introduced in Section 6.2.3). 
We note that the double-t copula is not the same as the usual ¢ copula since the 
tvs V, €1,..., €m do not have a multivariate t distribution (recall that uncorrelated 
bivariate f-distributed rvs are not independent, whereas V and s; are independent); 
the situation is similar for the double-GH copula. 

The main computational challenge in working with the double-t and double-GH 
copulas is the determination of the convolution of V and £; and the corresponding 
quantile function Fy, ' There exist a number of good solutions to this problem; 
references are given in Notes and Comments. 


Fav (t | v) = F.( 


Example 12.5 (LT-Archimedean copulas). Consider a positive rv V with df Gy 
such that Gy(0) = 0 and denote by Gy the Laplace-Stieltjes transform of Gy. 
According to Definition 7.52 the associated LT-Archimedean copula is given by the 


formula 2 
C(ul,..., Um) = r (exp (-v >» Gy"), 
i=l 


12.3. Index Derivatives in Factor Copula Models 491 


As usual, denote by F; (ti) the marginal survival function of t;. Using (12.13) the 
joint survival function of t can be calculated to be 


m 
F(ti,....tm) = e(T] expl- VÂ aD), (12.19) 
i=l 
which is obviously of the general form (12.16). Recall that in the special case of 
the Clayton copula with parameter 0 we have V ~ Ga(1/6, 1); explicit formulas 
for Gy and Gy! in that case are given in Algorithm 7.53 for the simulation of LT- 
Archimedean copulas. Note that LT-Archimedean copulas are in general not radially 
symmetric, so the one-period version of the copula model for default times with 
survival function (12.19) is not the same as the threshold model with Archimedean 
copula presented in Example 11.4; it corresponds instead to a threshold model with 
Archimedean survival copula. 


Notes and Comments 


The first copula model for portfolio credit risk was given by Li (2000); his model 
is based on the Gauss copula. General copula models were introduced for the first 
time in Schonbucher and Schubert (2001). One of the first papers on factor copula 
models was Laurent and Gregory (2005). These models have received a lot of atten- 
tion, mostly in connection with implied correlation skews (see also Section 12.3 
below). The double-t copula was proposed by Hull and White (2004); numerical 
aspects of this copula model are discussed in Vrins (2009). Double-GH copulas 
are considered by Kalemanova, Schmid and Werner (2005), Guégan and Houdain 
(2005) and Eberlein, Frey and von Hammerstein (2008), among others. 


12.3 Pricing of Index Derivatives in Factor Copula Models 


In this section we are interested in the pricing of index derivatives (index swaps 
and CDOs) in factor copula models. These models are the market standard for deal- 
ing with CDO tranches. We begin with analytical and numerical methods for the 
valuation of index derivatives. In Section 12.3.2 we turn to qualitative properties 
of observed CDO spreads and discuss the well-known correlation skews. In Sec- 
tion 12.3.3 we consider an alternative factor model, known as an implied copula 
model, that has been developed in order to explain correlation skews. 


12.3.1 Analytics 


Throughout we consider a factor copula model with factor-dependent recovery risk. 
More precisely, we assume that under the risk-neutral pricing measure Q the survival 
function of the default times follows a factor copula model with mixture representa- 
tion (12.16). Moreover, the loss given default of firm i is state dependent, so, given 
a realization of the factor V = v, the loss given default is of the form 6;(v) for 
some function ô; : RP — (0, 1). A state-dependent loss given default provides extra 
flexibility for calibrating the model. Consider, for example, the case where V is a 
one-dimensional rv and where the conditional default probabilities are increasing in 
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V. By taking increasing functions ô; (v) it is possible to model negative correlation 
between default probabilities and recovery rates. This extension is reasonable from 
an empirical viewpoint, as was discussed in Section 11.2.3. 


Index swaps. We begin by computing the value of the default and the premium 
payment leg for index swaps. Recall that VP* (x) = af Vee and vPrem(x) = 
oar, y Prem (x), where yrs and yen (x) represent the value of the default and of 
the premium payment leg of the single-name CDS on obligor i. Moreover, ype and 
vPrem (x) depend only on the loss process of firm i, which is given by L;,; = ôi Y; i. 
The value of an index swap does not therefore actually depend on the copula of the 
default times; this is in contrast to the single-tranche CDO analysed below. 

We now turn to the actual computations. If the recovery rates are constant, yee 
and yprem (x) can be computed using the results on the pricing of single-name 
CDSs in hazard rate models discussed in Section 10.4.4. In the general case one 
may proceed as follows. First, similar reasoning to that applied in (12.11) gives 


e T N 
yet = Be ( [ po(0, t) al) ~ $ p00, ta) (EP (Len) — E? (Lpi). 
n=1 


(12.20) 
Since L,, = )°"_, ĉi Yn, i» we find, by conditioning on V, that 


m 

E2(L;) ah XO si) E2(Y,,.i | V = vgv) dv. 
P j=l 

Now Y, i is Bernoulli with Q (Y, i =1|V =v) =1—- Fav (tn | v). Hence, 

m 

E= f DRWA- Favn |DV) de, 

Re ` 

i=1 


and substitution of this expression into (12.20) gives the value of the default payment 
leg. Similar reasoning allows us to conclude that VP"°™ (x), the value of the premium 
payments for a generic spread x, is equal to 


N m 
vPrem (x) = x aS po(0, w f ~~ Fav (tn | v)gv(v) dv. (12.21) 
n=l tiei 


In both cases the integration over v can be carried out with numerical quadrature 
methods. This is straightforward in the cases that are most relevant in practice, where 
V is a one-dimensional rv. 


CDO tranches. Recall from Section 12.1.3 that the essential task in computing 
CDO spreads is to determine the price of call or put options on L,,. By a conditioning 
argument we obtain, for a generic function f: Rt — R such as the pay-off of a 


call or put option, the formula 
y= v) Jev% dv. 


EUDE f Ee( HIRO 
s i=l 
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Table 12.3. Fair tranche spreads computed in an exchangeable Gauss copula model using 
the LPA and the simple normal approximation. The exact spread value is computed by an 
extensive Monte Carlo approximation. The row labelled “asset correlation” reports the value 
of the correlation parameter that is used in the pricing of the tranche. These correlations 
are so-called implied tranche correlations (see Table 12.4). The normal approximation is 
a substantial improvement over the LPA for the tranches where asset correlation is low, 
such as the [3, 6] and [6, 9] tranches; for the other tranches both approximation methods 
perform reasonably well. The reason for this is the fact that for high asset correlation the loss 
distribution is essentially determined by the realization of V, whereas for low asset correlation 
the form of the conditional distribution of L; matters. Numerical values are taken from Frey, 
Popp and Weber (2008). 


Tranche [0, 3] (3, 6] [6, 9] (9, 12] (12, 22] 
Asset correlation p =0.219 p = 0.042 p=0.148 p =0.223 p = 0.305 
LPA (%) 30.66 0.79 0.53 0.36 0.18 
Normal approximation (%) 29.38 1.51 0.66 0.42 0.20 
Exact spread (%) 28.38 1.55 0.68 0.42 0.20 


To discuss this formula we fix tn and introduce the simpler notation q;(v) := 
Q (Y, i = 1 | V = v). Since the rvs Y;,, ;, 1 < i < m, are conditionally independent 
given V, computing the inner conditional expectation essentially amounts to finding 
the distribution of a sum of independent Bernoulli rvs. This is fairly straightforward 
for a homogeneous portfolio, where the LGD functions and the conditional default 
probabilities for all firms are equal, so that gj(v) = q(v) and 6;(v) = 4d(v). In 
that case, L, = 6(V)M;,, where M, = pee 1 Y; i gives the number of firms that 
have defaulted up to time t. Given V = v, M,, is the sum of independent and (due 
to the homogeneity) identically distributed Bernoulli rvs, so M,, has a binomial 
distribution with parameters m and q (v). 

In a heterogeneous portfolio, on the other hand, the evaluation of the conditional 
loss distribution is not straightforward, essentially because one is dealing with sums 
of independent Bernoulli rvs that are not identically distributed. We discuss several 
approaches to this problem in the remainder of this section. 


Large-portfolio approximation. In the so-called large-portfolio approximation 
(LPA), the conditional distribution of L;, given V = v is replaced by a point mass 


m 


at the conditional mean €(v) = )°"_, ôi (v)qi (v). This leads to the approximation 


E2(f (Li,)) © Í, f(€(v)) gv (v) dv. (12.22) 


The method is motivated by the asymptotic results of Section 11.3. In particular, 
in that section we have shown that, in the case of one-factor models, quantiles of 
the portfolio loss distribution can be well approximated by quantiles of the rv £(V), 
provided that the portfolio is sufficiently large and not too inhomogeneous. One 
would therefore expect that for typical credit portfolios the approximation (12.22) 
performs reasonably well. Numerical results on the performance of the LPA are 
given in Table 12.3. 
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Normal and Poisson approximation. The LPA completely ignores the impact of 
the fluctuations of L,, around its conditional mean. One way of capturing these 
fluctuations is to replace the unknown conditional distribution of L;, with a known 
distribution with similar moments. In a simple normal approximation the conditional 
distribution of L;, is approximated by a normal distribution with mean ¢(v) and 
variance 


o? (v) := var(L;, | V =v) = $ 47 (v)qi(v)(1 — gi(v)) 
i=1 


(the choice of the normal distribution is of course motivated by the central limit 
theorem). Under the normal approximation one has 


= Í f(D exp C) dl gy (v) dv 
R? /2r0o2(v) JR 20? (v) l 
The performances of the normal approximation and the LPA are illustrated in 
Table 12.3 for the case of an exchangeable Gauss copula model. 

In a similar spirit, in a Poisson approximation the conditional loss distribution is 
approximated by a Poisson distribution with parameter à = £(v). These simple nor- 
mal and Poisson approximation methods can be refined substantially (see El Karoui, 
Kurtz and Jiao (2008) for details). There are a number of other techniques for eval- 
uating the conditional loss distribution. On the one hand, one may apply Fourier or 
Laplace inversion to the problem; on the other hand, there are a number of recursive 
techniques that can be used. References for both approaches can be found in Notes 
and Comments. 


EL (fL) © 


12.3.2 Correlation Skews 


Investors typically express CDO spreads in terms of implied correlations computed 
in a simple homogeneous Gauss copula model known as the benchmark model. In 
this way tranche spreads can be compared across attachment points and maturities. 
This practice is similar to the use of implied volatilities as a common yardstick 
for comparing prices of equity and currency options. In this section we explain 
this quoting convention in more detail and discuss qualitative properties of market- 
observed implied correlations. 

The benchmark model is an exchangeable Gauss copula model, as in Exam- 
ple 12.3. It is assumed that all default times are exponentially distributed with 
Q(ti > t) = ev? for all i and that the LGDs are deterministic and equal to 
60% for all firms. The hazard rate y Ê is used to calibrate the model to the spread of 
the index swap with the same maturity as the tranche under consideration; see the 
discussion of calibration of a homogeneous model to index swaps in Section 12.1.3. 
The only free parameter in the benchmark model is the “asset correlation” o. This 
parameter can be used to calibrate the model to tranche spreads observed in the 
market, which leads to various notions of implied correlation. 
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Implied correlations. The market uses two notions of implied correlation in order 
to describe market-observed tranche spreads. Implied tranche correlation, also 
known as compound correlation, is the lowest number p € [0, 1] such that the 
spread computed in the benchmark model equals the spread observed in the market. 
For mezzanine tranches the notion of implied tranche correlation suffers from two 
problems. First, a correlation number p such that the model spread matches the mar- 
ket spread does not always exist; this problem was encountered frequently for the 
spreads that were quoted in the period 2008-9 (during the financial crisis). Second, 
for certain values of the market spread there is more than one matching value of 
p € [0, 1], and the convention to quote the lowest such number is arbitrary. 

If market spreads are available for a complete set of tranches, as is the case for 
the standardized tranches on iTraxx and CDX, the so-called base correlation is 
frequently used as an alternative means of expressing tranche spreads. To compute 
base correlation we recursively compute implied correlations for a nested set of 
equity tranches so that they are consistent with the observed tranche spreads. The 
base correlation methodology is popular since it mitigates the calibration problems 
arising in the computation of implied tranche correlation. On the other hand, base 
correlations are more difficult to interpret and there may be theoretical inconsis- 
tencies, as we explain below. In the following algorithm the computation of base 
correlation is described in more detail. 


Algorithm 12.6. Suppose that we observe market spreads xf, « = 1,..., K, for 
a set of CDO tranches with attachment points (lę, ux), K = 1,...,k, with lı = 0 
and I.41 = ux. Denote by VP™(u, p, x) the value of the premium payment leg 
of an equity tranche with attachment point u and spread x > 0 in the benchmark 
model with correlation parameter p € [0, 1]. Denote by pret (u, p) the value of the 
corresponding default payment leg. We carry out the following steps. 


(1) Compute the base correlation pı as the implied tranche correlation of the 
[0, u1] tranche using the observed spread x;. Set «x = 1. 


(2) Given the base correlation p, and the spread X +1 (the market spread of the 
[lk+1, uk+1] tranche), compute the base correlation p,+1 as the solution of 
the following equation (an explanation is given after the algorithm): 


Veer, Pe+1) = Ve (uy, Px) 


= VPM Ges, Pr+ls xe414) T yP Cuy, Pr, Kea) (12.23) 


(3) Replace « with k + 1 and return to step (2). 


Equation (12.23) is derived from the following considerations. It is the basic 
premise of the base correlation approach that for all 1 < «x < K and for any spread 
x, the correct values of the default and premium legs of an equity tranche with upper 
attachment point uv, are given by VP*(u,, pp) and VP"™(u,., py, x), respectively. 
Under this assumption, the left-hand side of (12.23) gives the value of the default 
payment leg of the [/,41, uk+1] tranche and the right-hand side gives the value of 
the corresponding premium payment leg at the spread x7, , observed in the market. 
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For the observed spread xž 4, the value of the default and of the premium payment 


leg of the [/,41, U~+1] tranche have to be equal, which leads to equation (12.23) 
for x41. 

Note that, under the base-correlation methodology, two different models are used 
to derive the model value of the [/,., u,,] tranche: the benchmark model with correla- 
tion parameter px is used for the upper attachment point and the benchmark model 
with correlation parameter p,—; is used for the lower attachment point. As shown 
in Brigo, Pallavicini and Torresetti (2010), this may lead to inconsistencies if the 
base correlation approach is employed in the pricing of tranches with non-standard 
attachment points. 


Empirical properties of implied correlations. In Table 12.4 we give market spreads 
and implied tranche correlations for iTraxx tranches for three different days in 2004, 
2006 and 2008. These numbers are representative of three different periods in the 
credit index market: the early days of the market; before the credit crisis; and during 
the credit crisis. We see that implied correlations are quite unstable. In particular, 
a standard Gauss copula with a fixed correlation parameter p cannot explain all 
tranche spreads simultaneously (otherwise the implied correlation curves would be 
flat). 

Before the financial crisis, implied correlations exhibited a typical form that 
became known as an implied correlation skew: implied tranche correlations showed 
a V-shaped relationship to attachment point; implied base correlations were strictly 
increasing; and implied correlations for senior tranches were comparatively high. 
The 2004 and 2006 rows of the table are typical of the behaviour of implied tranche 
correlations (see also Figure 12.3). 

With the onset of the financial crisis, tranche spreads and implied tranche cor- 
relations became quite irregular. In particular, for mezzanine tranches it was often 
impossible to find a correlation number that would reproduce the spread that was 
observed on that day. For a more detailed description of the behaviour of implied 
correlation we refer the reader to the book by Brigo, Pallavicini and Torresetti (2010). 

It can be argued that correlation skews reflect deficiencies of the benchmark 
model. To begin with, the Gauss copula is an ad hoc choice that is motivated mostly 
by analytical convenience and not by thorough data analysis. Moreover, it is highly 
unlikely that the dependence structure of the default times in a portfolio can be char- 
acterized by a single correlation number. Also, the assumption of constant recovery 
rates is at odds with reality. 


Explaining observed CDO spreads in factor copula models. The fact that the 
benchmark model cannot reproduce observed CDO spreads for several tranches 
creates problems for the pricing of so-called bespoke tranches with non-standard 
maturities or attachment points and for the risk management of a book of tranches. 
In these applications one would like to take all available price information into 
account. There is a need for portfolio credit risk models that can be made consistent 
with observed spreads for several tranches simultaneously. 
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Table 12.4. Market quotes and implied tranche correlations for five-year tranches on the 
iTraxx Europe index on different dates. Note that the spread for the equity tranche corresponds 
to an upfront payment quoted as a percentage of the notional; the quarterly spread for the 
equity tranche is set to 5% by market convention. 


Type of data Year Index [0,3] [3,6] [6,9] [9,12] [12,22] 
Market spread 2004 42bp 27.6% 168bp 70bp 43bp 20 bp 
Tranche correlation 2004 22.4% 5.0% 15.3% 22.6% 30.6% 
Market spread 2006 26bp 145% 62bp  18bp 7 bp 3 bp 
Tranche correlation 2006 18.6% 7.9% 14.1% 17.25% 23.54% 
Market spread 2008 150bp 465% 5.7% 3.7% 2.3% 1.45% 
Tranche correlation 2008 51.1% 85.7% 95.3% 3.0% 17.6% 
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Figure 12.3. Compound correlation and base correlation corresponding to the CDO spreads 
in the 2004 row of Table 12.4. The spreads exhibit a typical correlation skew. (Data from Hull 
and White (2004).) 


Quite naturally, many researchers have addressed this problem by considering 
more general factor copula models where some of the unrealistic assumptions made 
in the benchmark model are relaxed. To begin with, several authors have shown that 
by introducing state-dependent recovery rates that are negatively correlated with 
default probabilities, the correlation skew can be mitigated (but not eliminated com- 
pletely). More importantly, alternative copula models have been employed, mostly 
models from the class of general one-factor copulas introduced in Example 12.4. It 
turns out that the double-t copula and, in particular, the double-GH copula model 
can be calibrated reasonably well to market-observed CDO tranche spreads (see, for 
example, Eberlein, Frey and von Hammerstein 2008). However, the empirical per- 
formance of these models worsened substantially with the onset of the credit crisis 
in 2007. For further details we refer to the references given in Notes and Comments. 


12.3.3 The Implied Copula Approach 


The class of implied copula models can be viewed as a generalization of the one- 
factor copula models considered in Section 12.2.2. In an implied copula model 
the factor variable V is modelled as a discrete rv whose probability mass function 
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is determined by calibration to market data such as observed index and tranche 
spreads. Since the support of V is usually taken as a fairly large set, this creates 
some additional flexibility that helps to explain observed market spreads for CDO 
tranches. 


Definition and key properties. The structure of the joint survival function of 
T1,---,Tm in an implied copula model is similar to the mixture representation 
(12.16) in a factor copula model. It is assumed that the qt; are conditionally indepen- 
dent given a mixing variable V that takes the discrete values v1, ..., vg (the states 
of the system); the conditional survival probabilities are of the form 


Q(t >t | V = vg) = exp(—Aj(uy)t), 1<ix<m, tO, (12.24) 
for functions 4;: {v1,..., ug} — (0, oo). The probability mass function of V is 
denoted by m = (71, ..., xk) witha, = Q(V = vg), kK = 1,..., K. It follows that 
the joint survival function of T1, ..., Tm is given by 

K m 
F(t,...,tm) = Dal [Jerca]. (12.25) 


k=1 i=1 
It is possible to make the functions å; (-) in (12.24) time dependent, and this is in 
fact necessary if the model is to be calibrated simultaneously to tranche spreads of 
different maturities. We will describe the time-independent case for ease of exposi- 
tion. 

Note that the name “implied copula model” is somewhat misleading, since 
both the dependence structure and the marginal distributions of the t; change 
if the distribution of V is changed. For instance, for m = (1,0,...,0) we 
have Q(t; > t) = exp(—A,(v1)t), whereas for m = (0,...,0,1) we have 
Q(t > t) = exp(—A;(vx)t). A better name for (12.25) would therefore be “a 
model with an implied factor distribution’, but the label “implied copula model” 
has become standard, essentially because of the influential paper by Hull and White 
(2006). 

In order to compute spreads of CDSs, index swaps or single-tranche CDOs in an 
implied copula model, we proceed in a similar fashion to the factor copula models 
discussed in the previous section. First we compute the conditional values of the 
default payment leg and the premium payment leg given V = vx, denoted by 
VPF (Cug) and VP®™ (vg, x), where x represents a generic spread level; the methods 
described in Section 12.3.1 can be used for this purpose. The unconditional values 
of the default payment and the premium payment legs are then given by averaging 
over the different states: that is, 


K K 
yPef = > Tk VP (yy) and yPrem (y) = 5 Tk yPrem(y,, x). 
k=1 k=1 
The fair spread of the structure is equal to x* = VP*/yPrem(1), 
Implied copula models are mostly used for the pricing of CDO tranches with non- 
standard attachment points or maturities. It is also possible to value tranches where 
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the underlying pool is different from a standard credit index, but the valuation of 
such products is somewhat subjective; we refer to the references given in Notes and 
Comments. In Section 17.4.3 we explain how to embed an implied copula model into 
a dynamic portfolio credit risk model. In that extension of the model it is possible 
to price options on standard index products such as CDS index swaps. 


Example 12.7 (implied copula models for a homogeneous portfolio). In a homo- 
geneous portfolio the default intensities of all firms are identical, A;(-) = A(-), and 
we may parametrize the model directly in terms of the values 41,...,AKx of the 
default intensity in the different states. Moreover, the states can be ordered so that 
Ay < à2 <--:+ < Ax. In this way, state 1 can be viewed as the best state of the 
economy (the one with the lowest default intensity) and state K corresponds to the 
worst state. In our calibration example below we use a model with K = 9 states 
and we assume that the default intensity takes values in the set {0.01%, 0.3%, 0.6%, 
1.2%, 2.5%, 4.0%, 8.0%, 20%, 70%}. 


For heterogeneous portfolios we require more sophisticated parametrizations. An 
example can be found in Rosen and Saunders (2009); further references are given 
in Notes and Comments. 

In practical implementations of the model the set of states {v1, . . . , vg }is typically 
chosen at the outset of the analysis and is kept fixed during model calibration. 
Moreover, in order to obtain pricing results for bespoke index products that are 
robust with respect to the precise values of the vg it is advisable to work on a fine 
grid, and hence with a fairly large number of states K. Of course, this means that 
the determination of the probability mass function x becomes a high-dimensional 
problem, but the calibration of x is comparatively easy, as we now explain. 


Calibration of x. nthe implied copula framework we need to determine the prob- 
ability mass function x from observed market data. As is usual in model calibration, 
x is found by minimizing some distance between market prices and model prices. 
This is substantially facilitated by the observation that model prices are linear in m. 
The set of all probability mass functions consistent with the price information at a 
given point in time ¢ can therefore be described in terms of a set of linear inequali- 
ties. We now explain this for the case of a corporate bond and a single-name CDS; 
similar arguments apply for index swaps and CDO tranches. 


e Consider a zero-coupon bond issued by firm i with maturity T , and denote by 
pi(vk) = exp(—Aj (vz) (T — t)) the value of the bond in state vg (we assume 
a recovery rate equal to zero for simplicity). Suppose that we observe bid and 
ask quotes p < Pp for the bond. In order to be consistent with this information, 
a probability mass function x needs to satisfy the following linear inequalities: 


K 
P< do mpi) <P. 
k=1 


e Consider next a CDS contract on firm 7; this is a simple example of a con- 
tract where two cash-flow streams are exchanged. Suppose that at time t we 
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observe bid and ask spreads x < x for the contract. Denote by VP Cog) and 
VP Co, x) the values of the premium payment and default payment legs of 
the contract in state k for a generic CDS spread x. Then x must satisfy the 
following two inequalities: 


K 


Yo EVP (ve, x) — V w) < 0, 
k=1 


K 
Do TEV wk, I) — VP O) > 0. 
k=1 


Moreover, x needs to satisfy the obvious linear constraints zg > O for all k and 
Dem = 1. 
It follows from the above discussion that the constraints on m at time ft can be 
written in the generic form 
Ant <b 


for some matrix A € RYK , some vector b € RY and some N € N. In order to finda 


vector x that satisfies this system of linear inequalities, fix a vector c = (c1, ..., CK) 
of weights and consider the linear programming problem 


K 
min Y > cg, subject to Ax < b. (12.26) 
weRK par 
Clearly, every solution of (12.26) is a probability mass function that is consistent 
with the given price information in the sense that it solves the system Ax < b. Note 
that problem (12.26) can be solved with standard linear programming software. 

If the number of states K is large compared with the number of constraints, 
there will typically be more than one probability mass function x that solves the 
system Aw < b. An easy way to check this is to vary the weight vector c in 
(12.26), since different weight vectors usually correspond to different solutions of 
the system Az < b. In that case a unique solution 2* of the calibration problem 
can be determined by a suitable regularization procedure. For instance, one could 
choose x* by minimizing the relative entropy to the uniform distribution on the set 


{v1,..., Ux}. This leads to the convex optimization problem 
K 
min Y > mln, subject to Ax < b, (12.27) 
weRK “— 


i=l 


which can be addressed with standard convex programming algorithms (see, for 
example, Bertsimas and Tsitsiklis 1997). Model calibration via entropy minimiza- 
tion has a number of attractive features, as is explained, for example, in Avellaneda 
(1998). In particular, the prices of bespoke tranches (the model output) depend 
smoothly on the spreads observed on the market (the model input). 

We close this section on implied copula models by presenting results from a 
calibration exercise of Frey and Schmidt (2011). In that paper the homogeneous 
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Table 12.5. Results of the calibration of a homogeneous implied copula model to iTraxx 
spread data (index and tranches) for different data sets from several years; the components of 
x* are expressed as percentages. The numerical results are from Frey and Schmidt (2011). 


A (in %) 


x*, data from 2004 12.66 22.9 420 176 25 145 0.54 0.13 0.03 
x*, data from 2006 22.2 29.9 39.0 76 12 0.16 0.03 0.03 0.05 
x*, data from 2008 1.1 7.9 576 108 11.7 49 1.26 1.79 2.60 
x*, data from 2009 0.0 136 6.35 42.2 22.3 125 0.0 0.00 3.06 


model of Example 12.7 was calibrated to iTraxx tranche and index spread data for 
the years 2004, 2006, 2008 and 2009; all contracts had a maturity of five years. 
The data from 2004 and 2006 are typical for tranche and index spreads before the 
credit crisis; the data from 2008 and 2009 represent the state of the market during 
the crisis. Entropy minimization was used in order to determine a solution 2* of 
the calibration problem. The resulting values of z* are given in Table 12.5. We 
clearly see that with the emergence of the credit crisis the calibration procedure puts 
more mass on states where the default intensity is high; in particular, the extreme 
state where à = 70% gets a probability of around 3%. This reflects the increased 
awareness of future defaults and the increasing risk aversion in the market after the 
beginning of the crisis. 

The implied copula model can usually be calibrated very well to tranche and index 
spreads with a single maturity; calibrating tranche spreads for several maturities 
simultaneously is more involved (see, for example, Brigo, Pallavicini and Torresetti 
(2010) for details). 


Notes and Comments 


Semianalytic approaches for the pricing of synthetic CDOs in factor copula models 
have been developed by Laurent and Gregory (2005), Hull and White (2004), Gibson 
(2004) and Andersen and Sidenius (2004), among others. Laurent and Gregory 
exploit the conditional independence structure of factor copula models and develop 
methods based on Fourier analysis; Andersen and Sidenius, Gibson and Hull and 
White propose recursive methods. The LPA is originally due to Vasicek (1997). Frey, 
Popp and Weber (2008) propose the normal approximation as a simple yet efficient 
alternative to the LPA. A very deep study of normal and Poisson approximations 
for the sum of independent random variables with applications to CDO pricing is 
El Karoui, Kurtz and Jiao (2008). 

There is arich literature on general one-factor copula models in relation to implied 
correlation skews: the double-t copula has been studied in Hull and White (2004) 
and Vrins (2009); double-GH copulas have been analysed by Eberlein, Frey and 
von Hammerstein (2008), Guégan and Houdain (2005) and Kalemanova, Schmid 
and Werner (2005), among others. The idea of using state-dependent recovery rates 
to improve the fit of CDO pricing models is explored, for instance, in Hull and 
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White (2006). The base-correlation approach is used in a number of (fairly dubious) 
extrapolation procedures for the pricing of non-standard tranches. An in-depth dis- 
cussion of implied correlation skews in credit risk can be found in Brigo, Pallavicini 
and Torresetti (2010). 

The implied copula model is due to Hull and White (2006); similar ideas can 
be found in Rosen and Saunders (2009) and in Frey and Schmidt (2012). Hull and 
White (2006) and Rosen and Saunders (2009) discuss the pricing of bespoke CDO 
tranches in the context of the model. The calibration of implied copula models to 
inhomogeneous portfolios is discussed in detail in Rosen and Saunders (2009) and 
in Frey and Schmidt (2012). An empirical assessment of the calibration properties 
of implied copula models is also given in Brigo, Pallavicini and Torresetti (2010). 
Calibration methods based on entropy minimization are discussed by Avellaneda 
(1998). 


13 


Operational Risk and Insurance Analytics 


We have so far concentrated on the modelling of market and credit risk, which 
reflects the historical development of quantitative risk management in the banking 
context. Some of the techniques we have discussed are also relevant in operational 
risk modelling, particularly the statistical models of extreme value theory (EVT) in 
Chapter 5 and the aggregation methodology of Chapter 8. But we also need other 
techniques tailored specifically to operational risk, and we believe that actuarial 
models used in non-life insurance are particularly relevant. 

In the first half of this chapter (Section 13.1) we examine the Basel requirements 
for the quantitative modelling of operational risk in banks, discussing various poten- 
tial approaches. On the basis of industry data we highlight the challenges involved 
in implementing a so-called advanced measurement (AM) approach based on mod- 
elling loss distributions, also known as the loss distribution approach (LDA). 

In operational risk there is no consensus concerning the best modelling approach. 
In contrast to market and credit risk, the data sources are more limited and the overall 
statistical properties of available data show a high degree of non-homogeneity and 
non-stationarity, defying straightforward applications of statistical tools. Rather than 
offering specific models, the current chapter aims to provide a set of tools that can 
be used to learn more about this important but difficult-to-model risk category. 

In Section 13.2 we summarize the techniques from actuarial modelling that are 
relevant to operational risk, under the heading of insurance analytics. Our discussion 
in that section, though motivated by quantitative modelling of operational risk, 
has much wider applicability in quantitative risk management. For example, some 
techniques have implicitly been used in the credit risk chapters. The Notes and 
Comments section at the end of the chapter gives an overview of further techniques 
from insurance mathematics that have potential applications in the broader field of 
quantitative risk management. 


13.1 Operational Risk in Perspective 
13.1.1 An Important Risk Class 


In our overview of the development of the Basel regulatory framework in Sec- 
tion 1.2.2 we explained how operational risk was introduced under Basel II as 
a new risk class for which financial institutions were bound to set aside regulatory 
capital. It has also been incorporated into the Solvency II framework for insurers, 
although we will concentrate on the banking treatment in this chapter. 
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We first recall the Basel definition as it appears in the final comprehensive version 
of the Basel II document (Basel Committee on Banking Supervision 2006). 


Operational risk is defined as the risk of loss resulting from inade- 
quate or failed internal processes, people and systems or from external 
events. This definition includes legal risk, but excludes strategic and 
reputational risk. 


Examples of losses due to operational risk. Examples of losses falling within this 
category are, for instance, fraud (internal as well as external), losses due to IT fail- 
ures, errors in settlements of transactions, litigation and losses due to external events 
like flooding, fire, earthquake or terrorism. Losses due to unfortunate management 
decisions, such as many of the mergers and acquisitions in the two decades leading 
up to the 2007-9 financial crisis, are definitely not included. However, the fact that 
legal risk is part of the definition has had a considerable impact on the financial 
industry in the aftermath of the crisis. 

An early case that touched upon almost all aspects of the above definition was 
that of Barings (see also Section 1.2.2). From insufficient internal checks and bal- 
ances (processes), to fraud (human risk), to external events (the Kobe earthquake), 
many operational risk factors contributed to the downfall of this once-renowned 
merchant bank. Further examples include the $691 million rogue trading loss at 
Allfirst Financial, the $484 million settlement due to misleading sales practices at 
Household Finance, and the estimated $140 million loss for the Bank of New York 
stemming from the September 11 attacks. 

More recent examples in which legal risk has been involved include LIBOR 
rigging, for which the European Commission fined eight large financial institutions a 
total of $2.3 billion, and the possible rigging of rates in the foreign-exchange market, 
a market with an estimated $5.35 trillion daily turnover. Following the 2007-9 
financial crisis several financial institutions faced fines for misselling of financial 
products—particularly securitized credit products—on the basis of inaccurate or 
misleading information about their risks. In the latter category, the Bank of America 
was fined the record amount of $16.65 billion on 21 August 2014. 

Increasing use of algorithmic and high-frequency trading has resulted in opera- 
tional losses. Examples include the 2010 Flash Crash and an estimated $440 million 
loss from a computer-trading glitch at Knight Capital Group. A number of promi- 
nent cases involving large losses attributable to rogue traders have also been widely 
reported in the press, including Fabrice Tourre at Goldman Sachs, Kweku Adoboli 
at UBS, Jérôme Kerviel at Société Générale and Bruno Iksil (also known as the 
London Whale) at JPMorgan Chase. 

All the examples cited above, and the seriousness with which they have been 
taken by regulators worldwide, offer clear proof of the fundamental importance 
of operational risk as a risk class. Many banks have been forced to increase the 
capital they hold for operational risk (in some cases by up to 50%). An example of a 
regulatory document that probes more deeply into a trading loss at UBS is FINMA 
(2012). 
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Distinctiveness of operational risk. An essential difference between operational 
risk, on the one hand, and market and credit risk, on the other, is that operational 
risk has no upside for a bank. It comes about through the malfunctioning of parts of 
daily business and hence is as much a question of quality control as anything else. 
Although banks try as hard as possible to reduce operational risk, operational losses 
continue to occur. 

Despite their continuing occurrence, a lack of publicly available, high-quality 
operational loss data has been a major issue in the development of operational 
risk models. This is similar to the problem faced by underwriters of catastrophe 
insurance. The insurance industry’s answer to the problem has involved data pool- 
ing across industry participants, and similar developments are now taking place 
in the banking industry. Existing sources of pooled data include the Quantitative 
Impact Studies (QISs) of the Basel Committee, the database compiled by the Federal 
Reserve Bank of Boston, and subscription-based services for members like that of 
the ORX (Operational Riskdata eXchange Association). Increasingly, private com- 
panies are also providing data. However, the lack of more widely accessible data at 
the individual bank level remains a major problem. As data availability improves, 
many of the methods discussed in this book (such as the extreme value models of 
Chapter 5 and the insurance analytics of Section 13.2) will become increasingly 
useful. 


Elementary versus advanced measurement approaches. In Section 13.1.3 we dis- 
cuss the advanced measurement (AM) approach, which is typically adopted by 
larger banks that have access to high-quality operational loss data. This approach 
is often referred to as the loss distribution approach (LDA), since a series of distri- 
butional models or stochastic processes are typically fitted to operational loss data 
that have been categorized into different types of loss. 

First, however, we discuss the so-called elementary approaches to operational 
risk modelling. In these approaches, aimed at smaller banks, the detailed modelling 
of loss distributions for different loss types is not required; a fairly simple volume- 
based capital charge is proposed. 

We note that, as in the case of credit risk, the approaches proposed in the Basel 
framework for the calculation of regulatory capital represent a gradation in com- 
plexity. Recall that, for credit risk, banks must implement either the standardized 
approach or the internal-ratings-based (IRB) approach, as discussed in Section 1.3.1. 
The field of regulation is in a constant state of flux and the detail of the methods 
we describe below may change over time but the underlying principles are likely to 
continue to hold. 


13.1.2 The Elementary Approaches 


There are two elementary approaches to operational risk measurement. Under the 
basic-indicator (BI) approach, banks must hold capital for operational risk equal to 
the average over the previous three years of a fixed percentage (denoted by a) of 
positive annual gross income (GJ). Figures for any year in which annual gross income 
is negative or zero should be excluded from both the numerator and denominator 
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when calculating the average. The risk capital under the BI approach for operational 
risk in year ¢ is therefore given by 


3 
1 l 
RC} (OR) = z X a max(GI'~, 0), (13.1) 
J i=1 


where Z; = ey Lar- >o} and GI'~ stands for gross income in year t — i. Note 
that an operational risk capital charge is calculated on a yearly basis. The BI approach 
gives a fairly straightforward, volume-based, one-size-fits-all capital charge. Based 
on the various QISs, the Basel Committee suggests a value of a of 15%. 

Under the standardized approach, banks’ activities are divided into eight busi- 
ness lines: corporate finance; trading & sales; retail banking; commercial banking; 
payment & settlement; agency services; asset management; and retail brokerage. 
Precise definitions of these business lines are to be found in the final Basel II docu- 
ment (Basel Committee on Banking Supervision 2006). Within each business line, 
gross income is a broad indicator that serves as a proxy for the scale of business 
operations and thus the likely scale of operational risk exposure. The capital charge 
for each business line is calculated by multiplying gross income by a factor (denoted 
by £) assigned to that business line. As in (13.1), the total capital charge is calculated 
as a three-year average over positive GIs, resulting in the following capital charge 
formula: 


3 8 
1 oats 
RCG(OR) = 5) max| ) p,GI ro] (13.2) 
i=1 j=l 


It may be noted that in formula (13.2), in any given year t — i, negative capital 
charges (resulting from negative gross income) in some business line j can off- 
set positive capital charges in other business lines (albeit at the discretion of the 
national supervisor). This kind of “netting” should induce banks to go from the 
basic-indicator approach to the standardized approach; the word “netting” is of 
course to be used with care here. Based on the QISs, the Basel Committee has set 
the beta coefficients as in Table 13.1. Moscadelli (2004) gives a critical analysis of 
these beta factors, based on the full database of more than 47 000 operational losses 
of the second QIS of the summer of 2002 (see also Section 13.1.4). Concerning the 
use of GI in (13.1) and (13.2), see Notes and Comments. 


13.1.3 Advanced Measurement Approaches 


Under an AM approach, the regulatory capital is determined by a bank’s own inter- 
nal risk-measurement system according to a number of quantitative and qualitative 
criteria set forth in the regulatory documentation (Basel Committee on Banking 
Supervision 2006). We will not detail every relevant step in the procedure that leads 
to the acceptance of an AM approach for an internationally active bank and its sub- 
sidiaries; the Basel Committee’s documents give a clear and readable account of 
this. We focus instead on the methodological aspects of a full quantitative approach 
to operational risk measurement. It should be stated, however, that, as in the case of 
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Table 13.1. Beta factors for the standardized approach. 


Business line (j) Beta factors (B;) 
j = 1, corporate finance 18% 
j = 2, trading & sales 18% 
j = 3, retail banking 12% 
j = 4, commercial banking 15% 
j = 5, payment & settlement 18% 
j = 6, agency services 15% 
j = 7, asset management 12% 
j = 8, retail brokerage 12% 


market and credit risk, the adoption of an AM approach to operational risk is subject 
to approval and continuing quality checking by the national supervisor. 

While the BI and standardized approaches prescribe the explicit formulas (13.1) 
and (13.2), the AM approach lays down general guidelines. In the words of the Basel 
Committee (see Basel Committee on Banking Supervision 2006, paragraph 667): 


Given the continuing evolution of analytical approaches for operational 
risk, the Committee is not specifying the approach or distributional 
assumptions used to generate the operational risk measure for regula- 
tory capital purposes. However, a bank must be able to demonstrate 
that its approach captures potentially severe “tail” loss events. What- 
ever approach is used, a bank must demonstrate that its operational risk 
measure meets a soundness standard comparable to that of the internal 
ratings-based approach for credit risk (comparable to a one year holding 
period and the 99.9 percent confidence interval). 


In the usual LDA interpretation of an AM approach, operational losses are typically 
categorized according to the eight business lines mentioned in Section 13.1.2 as well 
as the following seven loss-event types: internal fraud; external fraud; employment 
practices & workplace safety; clients, products & business practices; damage to 
physical assets; business disruption & system failures; and execution, delivery & 
process management. While the categorization of losses in terms of eight business 
lines and seven loss-event types is standard, banks may deviate from this format if 
appropriate. 

Banks are expected to gather internal data on repetitive, high-frequency losses 
(three to five years of data), as well as relevant external data on non-repetitive 
low-frequency losses. Moreover, they must add stress scenarios both at the level of 
loss severity (parameter shocks to model parameters) and correlation between loss 
types. In the absence of detailed joint models for different loss types, risk measures 
for the aggregate loss should be calculated by summing across the different loss 
categories. In general, both so-called expected and unexpected losses should be 
taken into account (i.e. risk-measure estimates cannot be reduced by subtraction of 
an expected loss amount). 
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We now describe a skeletal version of a typical AM solution for the calculation 
of an operational risk charge for year t. We assume that historical loss data from 
previous years have been collected in a data warehouse with the structure 


{Xe ef Ga 1,...,T; b= 1,...,8 £=1,...,7 k= l,o, NOY, 
(13.3) 
where ee stands for the kth loss of type £ for business line b in year t — i; 
N'DE is the number of such losses and T > 5 years, say. Note that thresholds 
may be imposed for each (i, b, £) category, and small losses less than the threshold 
may be neglected; a threshold is typically of the order of €10 000. The total historical 
loss amount for business line b in year t — i is obviously 


7 Ntb 
Che, aes (13.4) 
f=1 k=1 
and the total loss amount for year t — i is 


8 
ES Se ee, (13.5) 
b=1 
The problem in the AM approach is to use the loss data to estimate the distribution 
of L, for year t and to calculate risk measures such as VaR or expected shortfall 
(see Section 2.3) for the estimated distribution. Writing Q« for the risk measure at a 
confidence level a, the regulatory capital is determined by 


RC (OR) = 0a (L'), (13.6) 


where œ would typically take a value in the range 0.99-0.999 imposed by the 
local regulator. Because the joint distributional structure of the losses in (13.4) 
and (13.5) for any given year is generally unknown, we would typically resort to 
simple aggregation of risk measures across loss categories to obtain a formula of 
the form 


8 
RCAM(OR) = È` ea (L"”). (13.7) 
b=1 


In view of our discussions in Chapter 8, the choice of an additive rule in (13.7) 
can be understood. Indeed, for any coherent risk measure og, the right-hand side 
of (13.7) yields an upper bound for the total risk Qq(L‘). In the important case of 
VaR, the right-hand side of (13.7) corresponds to the comonotonic scenario (see 
Proposition 7.20). The optimization results of Section 8.4.4 can be used to calculate 
bounds for Qg (L*) under different dependence scenarios for the business lines (see, 
in particular, Example 8.40). 

Reduced to its most stylized form in the case when Qg = VaR, and a = 0.999, 
a capital charge under the AM approach requires the calculation of a quantity of the 


type 


N 
VaRo.999 (Dox). (13.8) 


k=1 
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where (Xx) is some sequence of loss severities and N is an rv describing the fre- 
quency with which operational losses occur. Random variables of the type (13.8) are 
one of the prime examples of the actuarial models that we treat in Section 13.2.2. 
Before we move on to those models in the next section, we highlight some “stylized 
facts” concerning operational loss data. 


13.1.4 Operational Loss Data 


In order to reliably estimate (13.6), (13.7) or, in a stylized version, a quantity 
like (13.8), we need extensive data. The data situation for operational risk is much 
worse than that for credit risk, and is clearly an order of magnitude worse than for 
market risk, where vast quantities of data are publicly available. As discussed in 
Section 13.1.1, banks have not been gathering data for long and pooling initiatives 
are still in their infancy. As far as we know, no reliable publicly available data source 
on operational risk exists. 

Our discussion below is based on industry data that we have been able to analyse 
as well as on the findings in Moscadelli (2004) for the QIS database and the results 
of the 2004 loss-data collection exercise by the Federal Reserve Bank of Boston (see 
Federal Reserve Bank of Boston 2005). An excellent overview of some of the data 
characteristics is to be found in the Basel Committee’s report (Basel Committee on 
Banking Supervision 2003). As the latter report states: 


Despite this progress, inferences based on the data should still be made 
with caution. ... In addition, the most recent data collection exercise pro- 
vides data for only one year and, even under the best of circumstances, 
a one-year collection window will provide an incomplete picture of the 
full range of potential operational risk events, especially of rare but 
significant “tail events”. 


Further information is to be found in the increasing number of papers on the topic, 
and particularly in the various papers coming out of the ORX Consortium (see Notes 
and Comments). 

In Figure 13.1 we have plotted operational loss data obtained from several sources; 
parts (a)—(c) show losses for three business lines for the period 1992—2001. It is less 
important for the reader to know the exact loss type—it is sufficient to accept that 
the data are typical for (b, £) categories in (13.3). In part (d) the data from the three 
previous figures have been pooled. 

Exploratory data analysis reveals the following stylized facts (confirmed in several 
other studies): 


e loss severities have a heavy-tailed distribution; 
e losses occur randomly in time; and 
e loss frequency may vary substantially over time. 
The third observation is partly explained by the fact that banks have gathered 


an increasing amount of operational loss data since the Basel II rules were first 
announced. There is therefore a considerable amount of reporting bias, resulting 
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Figure 13.1. Operational risk losses: (a) type 1, n = 162; (b) type 2, n = 80; 
(c) type 3, n = 175; and (d) pooled losses n = 417. 


in fewer losses in the first half of the 1990s and more losses afterwards. Moreover, 
several classes of loss may have a considerable cyclical component and/or may 
depend on changing economic covariables. For instance, back-office errors may 
depend on volume traded and fraud may be linked to the overall level of the econ- 
omy (depressions versus boom cycles). Moreover, there may be a rise in legal losses 
in the aftermath of a severe crisis, as has been observed for the 2007-9 credit crisis. 
This clear inhomogeneity in the loss frequency makes an immediate application of 
statistical methodology difficult. However, it may be reasonable to at least assume 
that the (inflation-adjusted) loss sizes have a common severity distribution, which 
would allow, for instance, the application of methods from Chapter 5. 

In Figure 13.2 we have plotted the sample mean excess functions (5.16) for the 
data in Figure 13.1. This figure clearly indicates the first stylized fact of heavy- 
tailed loss severities. The mean excess plots in (a) and (b) are clearly increasing in 
an approximately linear fashion, pointing to Pareto-type behaviour. This contrasts 
with (c), where the plot appears to level off from a threshold of 1. This hints at a 
loss distribution with finite upper limit, but this can only be substantiated by more 
detailed knowledge of the type of loss concerned. Pooling the data in (d) masks the 
different kinds of behaviour, and perhaps illustrates the dangers of naive statistical 
analyses that do not consider the data-generating mechanism. 

Moscadelli (2004) performed a detailed EVT analysis (including a first attempt to 
solve the frequency problem) of the full QIS data set of more than 47 000 operational 
losses and concluded that the loss dfs are well fitted by generalized Pareto distri- 
butions (GPDs) in the upper-tail area (see Section 5.2.2 for the necessary statistical 
background). The estimated tail parameters ( in (5.14)) for the different business 
lines range from 0.85 for asset management to 1.39 for commercial banking. Six of 
the business lines have an estimate of £ greater than 1, corresponding to an infinite- 
mean model! Based on these QIS data, the estimated risk capital/GI ratios (the 6 in 
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Figure 13.2. Corresponding sample mean excess plots for the data in Figure 13.1: 
(a) type 1; (b) type 2; (c) type 3; and (d) pooled. 


Table 13.1) range from 8.3% for retail banking to 33.3% for payment & settlement, 
with an overall alpha value (see (13.1)) of 13.3%, slightly below the Basel II value 
of 15% used in the BI approach. Note the much wider range of values of £ that 
emerge from the analysis of the QIS data compared with the prescribed range of 
12-18% for the standardized approach in Table 13.1. 


Notes and Comments 


Several textbooks on operational risk have been published: see, for example, Cruz 
(2002, 2004), King (2001), the Risk Books publication edited by Jenkins and Roberts 
(2004), and chapters in Ong (2003) and Crouhy, Galai and Mark (2001). In particular, 
Chapter 4 of Cruz (2004), written by Carolyn Currie, gives an excellent overview 
of the regulatory issues surrounding operational risk. Further textbooks include 
Shevchenko (2011), Cruz, Peters and Shevchenko (2015) and Panjer (2006). The 
Journal of Operational Risk publishes relevant research from academia and practice, 
including informative papers coming out of the ORX Consortium: see, for example, 
Cope and Antonini (2008) and Cope and Labbi (2008). 

Papers on implementing operational risk models in practice include Ebnother 
et al. (2003); Frachot, Georges and Roncalli (2001), which discusses the loss dis- 
tribution approach to operational risk; Doebeli, Leippold and Vanini (2003), which 
shows that a good operational risk framework may lead to an overall improvement 
in the quality of business operations; and Aue and Kalkbrener (2006), which pro- 
vides a comprehensive description of an approach based on loss distributions that 
was developed at Deutsche Bank. Excellent data-analytic papers using published 
operational risk losses are de Fontnouvelle et al. (2003) and Moscadelli (2004). 
Rosenberg and Schuermann (2006) address the aggregation of market, credit and 
operational risk measures. A comprehensive study, combining internal and external 
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data, together with expert opinion, in a scenario-generation context is de Jongh et al. 
(2014); this paper also contains an excellent list of references. 

Figure 13.1 is taken from Embrechts, Kaufmann and Samorodnitsky (2004). 
This paper also stresses the important difference between so-called repetitive and 
non-repetitive losses. For the former (to some extent less important) losses, statis- 
tical modelling can be very useful. For non-repetitive, low-probability, high-severity 
losses, much more care has to be taken before a statistical analysis can be performed 
(see Pézier 2003a,b). 

EVT methods for operational risk quantification have been used by numerous 
authors (see, for example, Coleman 2002, 2003; Medova 2000; Medova and Kyria- 
cou 2000). Because of the non-stationarity of operational loss data over several years, 
more refined EVT models are called for: see, for example, Chavez-Demoulin and 
Embrechts (2004) and Chavez-Demoulin, Embrechts and Hofert (2014) for some 
examples of such models. For a critical article on the use of EVT for the calculation 
of an operational risk capital charge, see Embrechts, Furrer and Kaufmann (2003), 
which contains a simulation study of the number of data needed to come up with 
a reasonable estimate of a high quantile. The use of statistical methods other than 
EVT are discussed in the textbooks referred to above. These methods include linear 
predictive models, Bayesian belief networks, discriminant analysis, and tools and 
techniques from reliability theory. 

We have noted that severity models for operational risk are typically (extremely) 
heavy tailed. Several publications report infinite-mean models. For an early discus- 
sion of this issue, see NeSlehova, Embrechts and Chavez-Demoulin (2006). The 
reader interested in a more philosophical discussion of the economic sense of such 
models should carry out an internet search for “the dismal theorem” and read some 
of the material that has been written on this topic. The term “dismal theorem” was 
coined by Martin Weitzman, who introduced it in relation to the economics of catas- 
trophic climate change; an interesting paper on the topic is Weitzman (2011). 

It should be clear from what we learned in earlier chapters about risk measures, 
their aggregation and their statistical estimation that calculating a 99.9% yearly VaR 
for operational risk is, to put it mildly, a daunting task. As a consequence, Ames, 
Schuermann and Scott (2014) suggest several regulatory policy changes leading to 
simpler, more standardized, more stable and more robust methodologies, at least 
until our understanding of operational risk has increased. In a recent publication, 
the Basel Committee on Banking Supervision (2014) proposes a change from the 
gross income (GI) indicator in (13.1) and (13.2) to a new business indicator (BI). 


13.2 Elements of Insurance Analytics 
13.2.1 The Case for Actuarial Methodology 


Actuarial tools and techniques for the modelling, pricing and reserving of insurance 
products in the traditional fields of life insurance, non-life insurance and reinsurance 
have a long history going back more than a century. More recently, the border 
between financial and insurance products has become blurred, examples of this 
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process being equity-linked life products and the transfer of insurance risks to the 
capital markets via securitization (see Chapter 1, particularly Section 1.5.3 and the 
Notes and Comments). 

Whereas some of the combined bank-assurance products have not met with the 
success that was originally hoped for, it remains true that there exists an increasing 
need for financial and actuarial professionals who can close the methodological gaps 
between the two fields. In the sections that follow we discuss insurance-analytical 
tools that we believe the more traditional finance-oriented risk manager ought to be 
aware of; the story behind the name insurance analytics can be found in Embrechts 
(2002). 

It is not only the occasional instance of joint product development between the 
banking and insurance worlds that prompts us to make a case for actuarial method- 
ology in QRM, but also the observation that many of the concepts and techniques 
of QRM described in the preceding chapters are in fact borrowed from the actuarial 
literature. 


e Risk measures like expected shortfall (Definition 2.12) have been studied in 
a systematic way in the insurance literature. Expected shortfall is also the 
standard risk measure to be used under the Solvency II guidelines. 


e Many of the dependence modelling tools presented in Chapter 7 saw their first 
applications in the realm of insurance. Moreover, notions like comonotonicity 
of risk factors have their origins in actuarial questions. 


e InSection 2.3.5 we discussed the axiomatization of financial risk measures and 
mentioned the parallel development of insurance premium principles (often 
with very similar goals and results). 


e The statistical modelling of extremal events has been a bread-and-butter sub- 
ject for actuaries since the start of insurance. Many of the tools presented in 
Chapter 5 are therefore well known to actuaries. 


e Within the world of credit risk management, the industry model CreditRiskT 
(Section 11.2.5) is known as an actuarial model. 


e The actuarial approach to the modelling of operational risk is apparent in the 
AM approach of Section 13.1.3. 


In the sections that follow we give a brief discussion of relevant actuarial techniques. 
The material presented should enable the reader to transfer actuarial concepts to 
QRM in finance more easily. We do not strive for a full treatment of the relevant 
tools as that would fill a separate (voluminous) textbook (see, for example, Denuit 
and Charpentier (2004), Mikosch (2004) and Partrat and Besson (2004) for excellent 
accounts of many of the relevant techniques). 


13.2.2 The Total Loss Amount 


Reconsider formula (13.8), where a random number N of random losses or severi- 
ties Xg occurring in a given time period are summed. To apply a risk measure like 
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VaR we need to make assumptions about the (X) and N, which leads us to one of 
the fundamental concepts of (non-life) insurance mathematics. 


Definition 13.1 (total loss amount and distribution). Denote by N (t) the (ran- 
dom) number of losses over a fixed time period [0, t] and write X1, X2,... for the 
individual losses. The total loss amount (or aggregate loss) is defined as 


Na) 
Sn) = 5 Xk, (13.9) 
k=1 


with df Fsyay (x) = P(Snqa) <S x), the total (or aggregate) loss df. Whenever t is 
fixed, t = 1 say, we may drop the time index from the notation and simply write Sy 
and Fs,,. 


Remark 13.2. The definition of (13.9) as an rv is to be understood as Sy) (@) = 
ey (ny k(@), œw € 2, and is referred to as a random (or randomly indexed) sum. 


A prime goal of this section will be the analytical and numerical calculation 
of Fsy, which requires further assumptions about the (Xx) and N. 


Assumption 13.3 (independence, compound sums). We assume that the rvs (Xx) 
are iid with common df G, G(O) = 0. We further assume that the rvs N and (Xx) 
are independent; in that case we refer to (13.9) as acompound sum. The probability 
mass function of N is denoted by py(k) = P(N =k), k =0,1,2,.... The rv N 
is referred to as the compounding rv. 


Proposition 13.4 (compound distribution). Let Sy be a compound sum and sup- 
pose that Assumption 13.3 holds. Then, for all x > 0, 


Fy (x) = P(Sy <x) =} pr OGP), (13.10) 
k=0 


where G® (x) = P (Sk < x), the kth convolution of G. Note that G® (x) = 1 for 
x > 0, and G® (x) =0 for x <0. 


Proof. Suppose that x > 0. Then 


Fsy (x) = X P(Sv <x |N=WPW =k) = Y phV). 
k=0 k=0 


Although formula (13.10) is explicit, its actual calculation in specific cases is 
difficult because the convolution powers G™ of a df G are not generally avail- 
able in closed form. One therefore resorts to (numerical) approximation methods. 
A first class of these uses the fact that the Laplace—Stieltjes transform of a convo- 
lution is the product of the Laplace-Stieltjes transforms. Using the usual notation 
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Ê (s) = a e °* dF (x), where s > 0 for Laplace—Stieltjes transforms, we have 
that G(s) = (G(s))*. It follows from Proposition 13.4 that 


Psy (s) = X pn) Ĝ* (s) = My (G(s)), s > 0, (13.11) 
k=0 


where Iy denotes the probability-generating function of N, defined by ITy(s) = 
Wier Pk)s*. 


Example 13.5 (the compound Poisson df). Suppose that N has a Poisson df with 
intensity parameter A > 0, denoted by N ~ Poi(A). In that case, py (k) = e~*a*/K!, 
k > 0, and, for s > 0, 


Ty (s) = D a = e709). 
k=0 f 


From (13.11) it therefore follows that, for s > 0, 
Fs,,(s) = exp(—A(1 — G(s))). 


In this case, the df of Sy is referred to as the compound Poisson df and we write 
Sy ~ CPoi(A, G). 


Example 13.6 (the compound negative binomial df). Suppose that N has a nega- 
tive binomial df with parameters œ > 0 and 0 < p < 1, denoted by N ~ NB@a, p). 
The probability mass function is given by (A.18) and we get, for0 < s < (1— p)7!, 


< a+k-1\ a kok P 7 
Finis) = ( k Jota ps = (785) 


k=0 
From (13.11) it therefore follows that, for s > 0, 


TERN 
1 — G(s) — p) 


In this case, the df of Sy is referred to as the compound negative binomial df and 
we write Sy ~ CNB (g, p, G). 


Formula (13.11) facilitates the calculation of moments of Sy and lends itself to 
numerical evaluation through Fourier inversion, using a technique known as the fast 
Fourier transform (FFT) (see Notes and Comments for references on the latter). 
For the calculation of moments, note that, under the assumption of the existence of 
sufficiently high moments and hence differentiability of G and ITy, we obtain 


ak 
qk iN) = E(N(N — 1)---(N-—k+)) 
s=l 
and 
k dé A k 
(-1) que OS) ng = E(X}) = uk. 
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Example 13.7 (continuation of Example 13.5). In the case of the compound Pois- 
son df one obtains 


d « A A 
ESN) = CDs O = exp AC — G(0)))A(—G"(0)) 


s=0 
= Ap = E(N)E(X)). 
Similar calculations yield var(Sy) = E(S%) — (E(Sy))* = Ap2. 
For the general compound case one obtains the following useful result. 


Proposition 13.8 (moments of compound dfs). Under Assumption 13.3 and 
assuming that E(N) < œ, u2 < œ, we have that 


E(Sy) = E(N)E(X,) and var(Sy) = var(N)(E(X))* + E(N) var(X}). 
(13.12) 


Proof. This follows readily from (13.11), differentiating with respect to s. The 
following direct proof avoids the use of transforms. Conditioning on N and using 
Assumption 13.3 one obtains 

o 


N 
E(Sy) = E(E(Sw | N)) = z(e( Dox 
k=1 


N 
= e( exo) = E(N)E(X1) 


k=1 


and, similarly, 
N N 
N)) = e(e(X 5 XXe 


E(S}) = e(e((2x) D2 N)) 


= E(Nu2 + N(N — Ij) = E(N)m + (E(N*) — E(N)) uj 
= E(N) var(X1) + E(N’)(E(X1))’, 


so var(Sy) = E(S%) — (E(Sy))* = E(N) var(X1) + var(N)(E(X1))?. 


Remark 13.9. Formula (13.12) elegantly combines the randomness of the frequency 
(var(N)) with that of the severity (var(X,)). In the compound Poisson case it reduces 
to the formula var(Sy) = AE (X dD = Àm, as in Example 13.7. In the deterministic- 
sum case, when P(N = n) = 1, say, we find the well-known results E(Sy) = ny1 
and var(Sy) = n var(X1); indeed, in this degenerate case, var(N) = 0. 


The compound Poisson model is a basic model for aggregate financial or insur- 
ance risk losses. The ubiquitousness of the Poisson distribution in insurance can 
be understood as follows. Consider a time interval [0, 1] and let N denote the total 
number of losses in that interval. Suppose further that we have a number of poten- 
tial loss generators (transactions, credit positions, insurance policies, etc.) that can 
produce, with probability pn, one loss or, with probability 1 — pn, no loss in each 
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small subinterval ((k — 1)/n, k/n] for k = 1, ...,n. Moreover, suppose that the 
occurrence or non-occurrence of a loss in any particular subinterval is not influ- 
enced by the occurrence of losses in other intervals. The number N, of losses then 
has a binomial df with parameters n and py, so 


n 


P(N, =k) = ( 


) ofa =o. k=0,...,n. 

Combined with a loss-severity distribution this frequency distribution gives rise, 
in (13.10), to the so-called binomial loss model. Next suppose that n —> oo in such 
a way that limy—oo npn = à > 0. It follows from Poisson’s theorem of rare events 
(see also Section 5.3.1) that 


k 
lim P(N, =k) Lge k=0,1,2,..., 
noo k! 
i.e. Noo ~ Poi(A), explaining why the Poisson model assumption is very natural 
as a frequency distribution and why the compound Poisson model is a common 
aggregate loss model. The compound Poisson model has several nice properties, 
one of which concerns aggregation and is useful in the operational risk context in 
situations such as (13.5). 


Proposition 13.10 (sums of compound Poisson rvs). Suppose that the compound 
sums Sy, ~ CPoi(A;, Gi), i = 1,...,d, and that these rvs are independent. Then 
Sy = ZL Sy, ~ CPoi(A, G), where à = YL] A; and G = YL (Ai /Gi. 


Proof. (For d = 2, the general case being similar.) Because of independence and 
Example 13.5 we have, for the Laplace—Stieltjes transform of Sy, 


Fyy (8) = Foy, (5) Fy, (8) 
= exp (0: + (1 — 


= exp(—A(1 — Ĝ(s))), 


TENA (416118) + 26206) J 


where A = i; + Az and 


= M 2 
a es ee ee 


G2. 


The result follows since the Laplace-Stieltjes transform uniquely determines the 
underlying df. 


The new intensity A is just the sum of the old ones, whereas the new severity df G 
is a discrete mixture of the loss dfs G; with weights A;/A, i = 1,...,d. We can 
easily simulate losses from such a model through a two-stage procedure: first draw i 
(i = 1,...,d) with probability 4; /4, and then draw a loss with df G;. 
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Figure 13.3. Histogram of simulated compound loss data (n = 100000) for 
Sy ~ CPoi(100, Exp(1)) together with normal approximation (13.14). 


Beyond the Poisson model. The Poisson model can serve as a stylized represen- 
tation of the loss-generating mechanism from which more realistic models can be 
derived. For instance, we may wish to introduce a time parameter in N to cap- 
ture different occurrence patterns over time (see Section 13.2.6). Also, the intensity 
parameter à may be assumed to be random (see Example 13.21). Indeed, a further 
step is to turn A into a stochastic process, which gives rise to such models as dou- 
bly stochastic (or Cox) processes (see Section 10.5.1) or self-exciting processes, as 
encountered in Section 16.2.1. Furthermore, various forms of dependence among 
the Xx rvs or between N and (X+) could be modelled. Finally, multiline portfolios 
require multivariate models for vectors of the type (Sy,,..., Syg). An ultimate 
goal of the AM approach to operational risk would be to model such random vectors 
where, for instance, d might stand for seven risk types, eight business lines, or a 
total of 56 loss category cells. 


13.2.3 Approximations and Panjer Recursion 


As mentioned in Section 13.2.2, the analytic calculation of Fs, is not possible 
for the majority of reasonable models, which has led actuaries to come up with 
several numerical approximations. Below we review some of these approximations 
and illustrate their use for several choices of the severity df G. The basic example 
we look at is the compound Poisson case, Sy ~ CPoi(A, G), though most of the 
approximations discussed can be adjusted to deal with other distributions for N. 
Given A and G we can easily simulate F’s,, and, by repeating this many times, we can 
get an empirical estimate that is close to the true df. Figure 13.3 contains a simulation 
of n = 100000 realizations of Sy ~ CPoi(100, Exp(1)). Although the histogram 
exhibits mild skewness (which can easily be shown theoretically (see (13.15))), 
a clear central limit effect takes place. This is used in the first approximation below. 


Normal approximation. As the loss rvs X; are iid (with finite second moment, 
say) and Sy is a (random) sum of the X; variables, one can apply Theorem 2.5.16 
from Embrechts, Kluppelberg and Mikosch (1997) and Proposition 13.8 to obtain 
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the following approximation, for general N: 

x — EWN)E(X1) ) 
Vyar NE (XD) + E(N) var(X1) 
Here, and in the approximations below, “~” has no specific mathematical interpre- 
tation beyond “there exists a limit result justifying the right-hand side to be used as 


approximation of the left-hand side”. In particular, for the compound Poisson case 
above, (13.13) reduces to 


(13.13) 


Fy (x) ~ o( 


Fs, (2) x (220) (13.14) 
y /200 


where @ is the standard normal df, as usual. It is this normal approximation that 
is superimposed on the histogram in Figure 13.3. Clearly, there are conditions that 
must be satisfied in order to obtain the approximation (13.13): for example, claims 
should not be too heavy tailed (see Theorem 13.22). 

For CPoi(A, G) it is not difficult to show that the skewness parameter satisfies 


E((Sw — E(Sy))°) EXD 


CaS AED 


(note that X; > 0 almost surely), so an approximation by a df with positive skewness 
may improve the approximation (13.14), especially in the tail area. This is indeed 
the case and leads to the next approximation. 


>0 (13.15) 


Translated-gamma approximation. We approximate Sy by k+ Y, where k is 
a translation parameter and Y ~ Ga(«, 6) has a gamma distribution (see Sec- 
tion A.2.4). The parameters (k, a, 8) are found by matching the mean, the variance 
and the skewness of k + Y and Sy. It is not difficult to check that the following 
equations result: 


Q 


va JED 


In our case, where à = 100 and X; has a standard exponential distribution, these 
yield the equations k +a/B = 100, œ / 8? = 200 and 2/,/a = 0.2121 with solution 
a = 88.89, 6 = 0.67, k = —32.72. 


Commentary on these approximations. Both approximations work reasonably well 
for the bulk of the data. However, for risk-management purposes we are mainly 
interested in upper tail risk; in Figure 13.4 we have therefore plotted both approx- 
imations for x > 120 on a log-log scale. This corresponds to the tail area beyond 
the 90% quantile of Fsy. Similar plots were routinely used in Chapter 5 on EVT 
(see, for example, Figure 5.6). It becomes clear that, as can be expected, the gamma 
approximation works better in this upper tail area where the normal approximation 
underestimates the loss potential. 

Of course, for loss data with heavier tails than exponential (lognormal or Pareto, 
say), even the translated-gamma approximation will be insufficient, and other 
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Figure 13.4. Simulated CPoi(100, Exp(1)) data together with normal and translated- 
gamma approximations (log—log scale). The 99.9% quantile estimates are also given. 
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Figure 13.5. Simulated CPoi(100, LN(1, 1)) data (n = 100000) with normal, translated- 
gamma, GPD and Panjer recursion (see Example 13.18) approximations (log—log scale). 


approximations can be devised based on heavier-tailed distributions, such as trans- 


lated F, inverse gamma or generalized Pareto. 

Another approach could be based on Monte Carlo simulation of aggregate losses 
Sy to which an appropriate heavy-tailed loss distribution could then be fitted. 
One possible approach would be to model the tail of these simulated compound 
losses with the GPD using the methodology of Section 5.2.2. This is what has 
been done in Figures 13.5 and 13.6, where we have plotted various approximations 
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Figure 13.6. Simulated CPoi(100, Pa(4, 1)) data (n = 100000) with normal, 
translated-gamma, and GPD approximations (log-log scale). 


for CPoi(100, LN(1, 1)) and CPoi(100, Pa(4, 1)). The former corresponds to a stan- 
dard industry model for operational risk (see Frachot 2004). The latter corresponds 
to a class of operational risk models used in Moscadelli (2004). The message of 
these figures is clear: if the data satisfy the compound Poisson assumption, then the 
GPD yields a superior fit for high quantiles. 

We now turn to an important class of approximations based on recursive methods. 
In the case where the loss sizes (X;) are discrete and the distribution function of N 
satisfies a specific condition (see Definition 13.11 below) a reliable recursive method 
can be worked out. 

Suppose that X; has a discrete distribution so that P(X; € No) = 1 with g, = 
P(X, =k), pp = P(N = k) (for notational convenience we write pg for py (k)) 
and sz = P(Sy = k). For simplicity assume that gọ = 0 and let 


gl = P(Xi+---+Xn =k), 


the discrete convolution of the probability mass function gg. Note that, by definition, 


an = ee, ge gk—i. We immediately obtain the following identities: 


so = P(Sy = 0) = P(N = 0) = po, 


[0,6] 
(13.16) 
Sn = P(S =n)=}_ pg, nèl, 
k=1 


where the latter formula corresponds to Proposition 13.4 but now in the discrete 
case. As in Proposition 13.4 we note that (13.16) is difficult to calculate, mainly due 
to the convolutions g9. However, for an important class of counting variables N, 
(13.16) can be reduced to a simple recursion. For this we introduce the so-called 
Panjer classes. 


522 13. Operational Risk and Insurance Analytics 


Definition 13.11 (Panjer class). The probability mass function (p) of N belongs 
to the Panjer(a, b) class for some a, b € R if the following relationship holds for 
r 21: p, = (a + (b/r))pr-1. 


Example 13.12 (binomial). If N ~ B(n, p), then its probability mass function is 
Pr = Hp a — p)"* for 0 < r < n and it can be easily checked that 


Pr | Pp (n+ 1)p 
Pr-1 Lap ep) 
showing that N belongs to the Panjer(a, b) class with a = —p/(1 — p) and 


b= (n+ 1)p/(— p). 


Example 13.13 (Poisson). If N ~ Poi(A), then its probability mass function 
pr = e7^A" /r! satisfies p,/pr—-1 = àÀ/r, so N belongs to the Panjer(a, b) class 
with a = 0 and b = À. 


Example 13.14 (negative binomial). If N has a negative binomial distribution, 
denoted by N ~ NB(«, p), then its probability mass function is 


=j 
a Jra- py r20, a>0,0<p<1 
r 


(see Section A.2.7 for further details). We can easily check that 
a—ld- 
pe ( x p) i 


Hence N belongs to the Panjer (a, b) class with a = 1 — p and b = (« — 1)(1 — p). 
In Proposition 13.21 we will show that the negative binomial model follows very 
naturally from the Poisson model when one randomizes the intensity parameter of 
the latter using a gamma distribution. 


Remark 13.15. One can show that, neglecting degenerate models for (p), the above 
three examples are the only counting distributions satisfying Definition 13.11. This 
result goes back to Johnson and Kotz (1969) and was formulated explicitly in the 
actuarial literature in Sundt and Jewell (1982). 


Theorem 13.16 (Panjer recursion). Suppose that N satisfies the Panjer(a, b) 
class condition and that go = P(X, = 0) = 0, then so = po and, forr > 1, 
Sr = } (a + (bi /r))8giSr-i. 


Proof. We already know that sọ = po from (13.16), so suppose that r > 1. 
Noting that X1, ..., Xn are iid, we require the following well-known identity for 
exchangeable rvs: 


E(x 


ag 
= 
I 
mg 
“4 
I 


1 n 
> Deà 
j=1 
1 n 
(XX 
j=1 


Dx = r) 

i=l 

Sx = r) = z (13.17) 
i=l 
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Moreover, using the fact that gon D 


lation that 


1 1 
Dis D(t Feet = pa- D(a ~) sists” 


= 0 forn > 2, we make the preliminary calcu- 


where (13.17) is used in the final step. Therefore, the identity (13.16) yields 


Sr = 5 Png” = pig, + 3 Png” 


n=1 


œ r-l 


=(a+b)pogr +) (« + eip ie 


n=2 i=1 
r-1 


= (a+ diner + Yo (a1 =) pe un 


i=l 


r—l 
bi 
= (a + b)grso + > (« + sis 


i=l 


i bi 
= > a + — }8iSr-i- 
r 
i=l 


Remark 13.17. In the case of both the FFT method and the Panjer recursion, 
an initial discretization of the loss df G generally has to be made, which intro- 
duces an approximation error. An in-depth discussion of discretization errors for the 
computation of compound distributions is to be found in Grubel and Hermesmeier 
(1999, 2000) (see also references therein for a comparison of these approaches). A 
slight correction to Theorem 13.16 has to be made if go = P(X; = 0) > 0. One 
obtains so = )¢2. pegs and, for r > 1, s, = (1 — ago)7! X; (a + bi/r) gisy—i 
(see Mikosch 2004, Theorem 3.3.10). We give further references in Notes and Com- 
ments. 
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Example 13.18 (Panjer recursion for the CPoi(100, LN(1, 1)) case). In Fig- 
ure 13.5 we have included the Panjer approximation for the CPoi(100, LN(1, 1)) 
case. In order to apply Theorem 13.16 we first have to discretize the lognormal df. 
An equispaced discretization of about 0.5 yields the Panjer approximation in Fig- 
ure 13.5, which is excellent for quantile values around 0.999, relevant for applica- 
tions. The 99.9% quantile estimate based on the Panjer recursion is 735, a value very 
close to the GPD estimate. Far out in the tail, beyond 0.999, say, rounding errors 
become important (the tail drifts off) and one has to be more careful; we give some 
references in Notes and Comments on how to improve recursive methods far out in 
the tail. 


13.2.4 Poisson Mixtures 


Poisson mixture models have been used in both credit and operational risk modelling; 
for an example in the latter case see Cruz (2002, Section 5.2.2) as well as that book’s 
jacket, which features a negative binomial distribution (a particular Poisson mixture 
model). Poisson mixtures have been used by actuaries for a long time; the negative 
binomial made its first appearance in the actuarial literature as the distribution of 
the number of repeated accidents suffered by an individual in a given time span (see 
Seal 1969). 

In Example 13.5 we introduced the compound Poisson model CPoi(A, G), where 
N ~ Poi(A) counts the number of losses and G is the loss severity df. One disad- 
vantage of the Poisson frequency distribution is that var(N) = A = E(N), whereas 
count data often exhibit so-called overdispersion, meaning that they indicate a model 
where var(V) > E(N). A standard way to achieve this is by mixing the intensity À 
over some df F'4(A), i.e. assume that A > 0 is a realization of a positive rv A with 
this df so that, by definition, 


pO SENSOR [ P(N =k| A=A)dFa(2) 
0 


lee) ak 
=f e^ gr IFA O). (13.18) 
0 n 


Definition 13.19 (the mixed Poisson distribution). The rv N with df (13.18) is 
called a mixed Poisson rv with structure (or mixing) distribution F4. 
A consequence of the next result is that mixing leads to overdispersion. 


Proposition 13.20. Suppose that N is mixed Poisson with structure df F4. Then 
E(N) = E(A) and var(N) = E(A) + var(A), i.e. for A non-degenerate, N is 
overdispersed. 


Proof. One immediately obtains 


(0) œo œ k œ 
E(N) = X kpn(k) =i) S remt dF,(a) = AdF4(A) = E(A). 
k=0 9 k=0 k! 0 
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And, similarly, 


E(N?) = X K pw(k) = E(A) + E(A’), 
k=0 


so the result follows. 


We now give a concrete example of a mixed Poisson distribution, which is partic- 
ularly important in both operational risk and credit risk modelling. Indeed we have 
already used the following result when describing the industry credit risk model 
CreditRiskt in Section 11.2.5. 


Proposition 13.21 (negative binomial as Poisson mixture). Suppose that the rv 
N has a mixed Poisson distribution with a gamma-distributed mixing variable A ~ 
Ga(a, B). Then N has a negative binomial distribution N ~ NB(a, B/(B + 1)). 


Proof. Using the definition of a gamma distribution in Section A.2.4 we have 


lee) p ak pe oO 
P(N = k) = / Z e7} lepi dA = qetk—-le-(B+DA dh. 
o Fak ET Jo 


Substituting u = (6 + 1)A, the integral can be evaluated to be 


œœ Tœ +k) 
—(a+k), atk—1,—u pat 
[ (6B + 1) u e du = gg yE 


a k 
pw=n=(-5 VA ) oe. 
+! 6+1) KT(@) 


Using the relation I (œ + k) = (œ + k — 1) -- -æT (œ), we see that this is equal to 
the probability mass function of a negative binomial rv with p := 6/(B6 + 1) (see 
Section A.2.7). 


This yields 


Recall the definition of compound sums from Section 13.2.2 (Assumption 13.3 
and Proposition 13.4). In the special case of mixed Poisson rvs, compounding leads 
to so-called compound mixed Poisson distributions, such as the compound negative 
binomial distribution of Example 13.6. There is much literature on dfs of this type 
(see Notes and Comments). 


13.2.5 Tails of Aggregate Loss Distributions 


In Section 5.1.2 we defined the class of rvs with regularly varying or power tails. If 
the (claim size) df G is regularly varying with index œ > 0, then there exists a slowly 
varying function L (Definition 5.7) such that G(x) = 1 — G(x) = x“ L(x). The 
next result shows that, for a wide class of counting dfs (py (k)), the df of the 
compound sum Sy, Fsy, inherits the power-like behaviour of G. 


Theorem 13.22 (power-like behaviour of compound-sum distribution). Suppose 
that Sy is a compound sum with E(N) = à and suppose that there exists ane > 0 
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such that ye + e)‘ pn(k) < 00. IfG(x) = x7“ L(x) witha > 0 and L slowly 
varying, then 


so Fsy inherits the power-like behaviour of G. 


Proof. This result holds more generally for subexponential dfs; a proof together 
with further discussion can be found in Embrechts, Klippelberg and Mikosch (1997, 
Section 1.3.3). 


Example 13.23 (negative binomial). It is not difficult to show that the negative 
binomial case satisfies the condition on N in Theorem 13.22. The kind of argument 
that is required is to be found in Embrechts, Kliippelberg and Mikosch (1997, Exam- 
ple 1.3.11). Hence, if G(x) = x7% L(x), the tail of the compound-sum df behaves 
like the tail of G, i.e. 


Fs, (x) ~ goo): asx > œ. 


(For details, see Embrechts, Kluppelberg and Mikosch (1997, Section 1.3.3).) 


Under the conditions of Theorem 13.22 the asymptotic behaviour of F Sy (x) in 
the case of a Pareto loss df is again Pareto with the same index. This is clearly seen 
in Figure 13.6 in the linear behaviour of the simulated losses as well as the fitted 
GPD. In the case of Figure 13.5, one can show that F ‘sy (x) decays like a lognormal 
tail; see the reference given in the proof of Theorem 13.22 for details. Note that the 
GPD is able to pick up the features of the tail in both cases. 


13.2.6 The Homogeneous Poisson Process 


In the previous sections we looked at counting rvs N over a fixed time interval [0, 1], 
say. Without any additional difficulty we could have looked at N (t), counting the 
number of events in [0, t] for t > 0. In the Poisson case this would correspond to 
N(t) ~ Poi(At); hence, for fixed ż and on replacing à by At, all of the previous 
results concerning Poi(A) rvs can be suitably adapted. 

In this section we want to integrate the rvs N (t), t > 0, into a stochastic process 
framework. The less mathematically trained reader should realize that there is a big 
difference between a family of rvs indexed by time, for which we only specify the 
one-dimensional dfs (which is what we have done so far), and a stochastic process 
with a specific structure in which these rvs are embedded. This difference is akin to 
the difference between marginal and joint distributions, a topic we have highlighted 
as very important in Chapter 7 through the notion of copulas; of course, in the 
stochastic process case, there also has to be some probabilistic consistency across 
time. In a certain sense, the finite-dimensional problem of Chapter 7 becomes an 
infinite-dimensional problem. 

After these words of warning on the difference between rvs and stochastic pro- 
cesses, we now take some methodological shortcuts to arrive at our goal. The inter- 
ested reader wanting to learn more will have to delve deeper into the mathematical 
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Figure 13.7. A sample path of a counting process. 


background of stochastic processes in general and counting processes in particular. 
The Notes and Comments section contains some references. 


Definition 13.24 (counting processes). A stochastic process N = (N(t));>0 is 
a counting process if its sample paths are right continuous with left limits existing 
and if there exists a sequence of rvs Tọ = 0, Ti, 72, ... tending almost surely to oo 
such that N (t) = OP M<- 

A typical realization of such a process is given in Figure 13.7. We now define the 
homogeneous Poisson process as a special counting process. 


Definition 13.25 (homogeneous Poisson process). A stochastic process N = 
(N(t))i>0 is a homogeneous Poisson process with intensity (rate) A > 0 if the 
following properties hold: 


(i) N is a counting process; 
(ii) N(O) = 0, almost surely; 
Gii) N has stationary and independent increments; and 
(iv) for each t > 0, N(t) ~ Poi(Ar). 
Remark 13.26. Note that conditions (iii) and (iv) imply that, for O < u < v < t, 
the rvs N(v) — N(u) and N(t) — N(v) are independent and that, for k > 0, 
P(N(v) — NG) =k) = P(N(v— u) =k) 
k 
_ eA (—u) (A(v — u)) 
= a 
The rv N(v) — N (u) counts the number of events (claims, losses) in the interval 
(u, v]; by stationarity, it has the same df as N(v — u). In Figure 13.8 we have 


generated ten realizations of a homogeneous Poisson process on [0, 1] witha = 100. 
Note the rather narrow band within which the various sample paths fall. 


For practical purposes, the following result contains the main properties of the 
homogeneous Poisson process. 
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Figure 13.8. Ten realizations of a homogeneous Poisson process with A = 100. 


Theorem 13.27 (characterizations of the homogeneous Poisson process). Sup- 
pose that N is a counting process. The following statements are then equivalent: 
(1) N is a homogeneous Poisson process with rate à > 0; 


(2) N has stationary and independent increments and 


P(N(t)=1) =At+o(t), ast J 0, 
P(N(t) > 2) = ot), ast | 0; 


(3) the inter-event-times (Ax = Tk — Tk—1)k>1 are iid with distribution Exp(A); 
and 

(4) for allt > 0, N(t) ~ Poi(At) and, given that N (t) = k, the occurrence times 
Ti, D, ..., Tg have the same distribution as the ordered sample from k inde- 
pendent rvs, uniformly distributed on [0, t]; as a consequence, we can write 
the conditional joint density as 


k! 
ft)... TNO +++ tk) = gE O< <<<): 


Proof. Many standard textbooks on stochastic processes contain proofs of this 
important theorem (see, for example, Mikosch 2004; Resnick 1992). 


Discussion. Statement (2) in Theorem 13.27 implies that A can indeed be inter- 
preted as a rate or intensity: A = lim; ;o(1/t)P(N (t) = 1). Moreover, the same 
statement implies that a homogeneous Poisson process does not allow for clustering 
of events: lim;)9 P(N (t) > 2) = 0. Statement (3) gives an event-time definition of 
a homogeneous Poisson process. It follows immediately that the first event-time 
has an Exp(A) df: P(7, > t) = P(N(t) = 0) = e7, t > 0. Statement (3), how- 
ever, goes well beyond this by stating that the inter-event-times Aj, are iid with 
Ax ~ Exp(A). This leads to a straightforward way of simulating a stream of loss 
events from a homogeneous Poisson process with rate à. Moreover, this equivalent 
definition immediately yields a generalization by assuming that the Ax, are still iid 
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but that Ag ~ Fa, a general df. The resulting process is a so-called renewal process 
(note that the only Markovian renewal process is the homogeneous Poisson process). 

Finally, statement (4) yields an easy algorithm to generate the occurrences of 
homogeneous Poisson times over the interval [0, ¢] given that we have a total of 
k events up to t—we simply generate k uniform rvs on [0, t] and order them. 


Multivariate Poisson processes. In many applications we want to model the fre- 
quencies of different loss types with a number of Poisson processes while consid- 
ering possible dependence between loss frequencies for different loss types. More 
generally, we might want to construct a number of compound Poisson processes 
where loss severities for the different business lines were also dependent. A natural 
approach to modelling this dependence is to assume that all losses can be related 
to a series of underlying and independent Poisson shock processes. In insurance 
these shocks might be natural catastrophes; in credit risk modelling they might be 
a variety of economic events, such as local or global recessions; in operational risk 
modelling they might be the failure of various IT systems. When a shock occurs this 
may cause losses of several different types; the common shock causes the numbers 
of losses of each type to be dependent. See Lindskog and McNeil (2003), Pfeifer 
and NeSlehova (2004) and Chavez-Demoulin and Embrechts (2004) for models of 
this kind. 


13.2.7 Processes Related to the Poisson Process 


Using the fundamental building block of the homogeneous Poisson process, one can 
construct more general counting processes that are useful for loss-event modelling 
in finance and insurance. Such generalizations include the following. 


Renewal processes (mentioned above). The exponential waiting time distribution 
is replaced by a general df F4. 


Inhomogeneous Poisson processes. The constant intensity À is replaced by a deter- 
ministic function A(-). 

Mixed Poisson processes. The deterministic constant intensity À is replaced by an 
tv A. 


Doubly stochastic or Cox processes. À is replaced by a stochastic process {A;: t > 
0} in accordance with notation used in Chapter 10 (see, for example, Defini- 
tion 10.15). 


Self-exciting or Hawkes processes. À is replaced by a stochastic process depend- 
ing only on previous event-times. See Section 16.2.1 for a concrete example. 


Below, we highlight some features of some of these processes. 
Inhomogeneous Poisson processes. 


Definition 13.28 (inhomogeneous Poisson). A counting process N is an inhomo- 
geneous Poisson process if, for some deterministic function à(s) > 0, the following 
conditions hold: 
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(i) N(O) = 0, almost surely; 
(ii) N has independent increments; and 


(iii) for allt > 0, 


P(N(t+h)— N(t)=1)=åA(t)h+0o(h), hyd, 
P(N(t +h) — N(t) 2 2) = oh), h 0. 


The function A(-) is referred to as the intensity or rate function. The integral 
A(t) = h à(s) ds is referred to as the intensity measure (or cumulative intensity 
function). 


Remark 13.29. A characterization theorem, similar to Theorem 13.27, can be 
derived. In particular, we find that, forO < s < t, N(t)— N(s) ~ Poi(A(t) — A(s)). 


The inhomogeneous Poisson process is a useful tool in loss modelling whenever a 
deterministic trend or seasonality component is to be modelled in the loss frequency. 
The next example also shows that this process naturally emerges as a counting 
process for record losses. 


Example 13.30 (records). The world of finance and insurance abounds with state- 
ments on record events: the largest single-day drop in the dollar/yen, the most 
expensive hurricane, the three best fund managers during the last year, the second 
largest loss due to internal fraud, the biggest one-day change in the credit spread of 
a particular company, etc. Likewise, the world of records is intimately related to the 
(general) theory of Poisson processes. In Notes and Comments we shall give several 
references for this. Below we indicate how an easy example related to a question on 
records leads to an inhomogeneous Poisson process as a model. 

Suppose that the loss rvs X; > 0 are iid with density function f(x) > 0, x > 0. 
Define the counting process N: 


CO 


i=1 
N(t) counts the number of records in the sequence (X;);>1 of size less than t, 
and (N (t)) is referred to as the record process. It follows that, for h, t > 0, 
Ce 
PING +h) -N 2 1)= X PX: eG teh ond Xi- < t,..., X1 SD) 


i=1 


DOEA) — FF)! 
i=l 
F(t +h) — F(t) 
1 — F(t) 
f(t) 


= Topp tem), ash 10. 
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Moreover, for h, t > 0, 


P(N(t +h) — N(t) > 2) 
KS POR eA <t, Xi € (t,t +h], 


i<j Xign <tth,...,Xj-1<tt+h, Xj €(t+h)) 
tth 2 aA 
= ( f soas) POETE 
f i<j 


=o(h*), ash | 0. 


From these calculations one deduces that the record process N is inhomogeneous 
Poisson with rate function A(t) = f(t)/(1 — F (t)), the so-called hazard rate of F, 
a notion that we encountered in Section 10.4.1. 


Suppose now that, as in most practical cases, A(t) is strictly increasing, so 
A(A7!(t)) = A7!(A(t)) = t. We can then always transform an inhomogeneous 
Poisson process N with intensity measure A into a homogeneous Poisson process 
with intensity 1 by a change of time. 


Proposition 13.31 (time change, operational time). Suppose that N is an in- 
homogeneous Poisson process with A strictly increasing, and define, for t > 0, 
N(t) = N(A7!(t)). We then have that N is homogeneous Poisson with intensity 1. 


Proof. For t > 0 fixed and k > 0, 
1) (A(47! (t) Set r 
k! C kU 


so Ñ (t) ~ Poi(t). By definition, the increments of Ñ are independent; moreover, 
for 0 < u < v we have that 


P(N(t) =k) = P(N(A7 1) = k) =e 44 


P(N(v) — Ñ (u) = k) = P(N(A7'(v)) — N(47! (u)) = k) 


= AA!) AA“! u) MATTO) = AUT) 
k! 


k 
e7 0u) (v—u) ; 
k! 


from which stationarity follows. 


This is one of the many examples in insurance and finance where a more compli- 
cated process N can be reduced to a standard (easier) model Ñ through the careful 
choice of a new time clock (a so-called time change construction) (see also Sec- 
tion 10.5.1 on credit risk). Proposition 13.31 can be formulated more generally for A 
not strictly increasing, and the converse also holds. Proposition 13.31 justifies the 
common simplifying assumption that a loss frequency model is homogeneous (unit 
rate) Poisson, albeit in many cases only in operational time. The original time-scale 
of N is slowed down or speeded up in such a way that, on average, Ñ has one claim 
per time unit, whereas N has, on average, A (1) claims. 


532 13. Operational Risk and Insurance Analytics 


Remark 13.32. A standard way in which an inhomogeneous Poisson process can 
be obtained from a homogeneous Poisson process is by random sampling. Suppose 
an intensity function A satisfies à(s) < c < oo for s > 0. Start from a homogeneous 
Poisson process with rate c > 0 and denote its arrival times by Tọ = 0, Ti, 72, .... 
Construct a new process N from (7;)i>o0 through deletion of each T; independently 
of the other T; with probability 1 — (A(7;)/c). The so-called thinned counting 
process N consists of the remaining (undeleted) points. It can be shown that this 
process is inhomogeneous Poisson with intensity function A(-). 


Mixed Poisson processes. The mixed Poisson rvs of Section 13.2.4 can be embed- 
ded into a so-called mixed Poisson process. A single realization of such a process can- 
not be distinguished through statistical means from a realization of a homogeneous 
Poisson process; indeed, to simulate a sample path, one first draws a realization of 
the random intensity A = A (œ) and then draws the sample path of the homogeneous 
Poisson process with rate 4. (Here, A denotes an rv and not the intensity measure 
in the inhomogeneous Poisson case above.) Only by repeating this simulation more 
frequently does one see the different probabilistic nature of the mixed Poisson pro- 
cess: compare Figure 13.9 with Figure 13.8. In the former we have simulated ten 
sample paths from a mixed Poisson process with mixing variable A ~ Ga(100, 1) 
so that E(A) = 100. Note the much greater variability in the paths. 


Example 13.33. When counting processes are used in credit risk modelling the times 
Tx typically correspond to credit events, for instance default or downgradings. More 
precisely, a credit event can be constructed as the first jump of a counting process N. 
The df of the time to the credit event can be easily derived by observing that P (T; > 
t) = P(N(t) = 0). This probability can be calculated in a straightforward way for 
a homogeneous Poisson process with intensity à; we obtain P(N (t) = 0) = et, 
When N is a mixed Poisson process with mixing df F4 we obtain 


P(T, > t) = P(N(t) = 0) = [ e™™ AFAA) = Fa(t), 
0 


the Laplace-Stieltjes transform of F4 in t. In the special case when A ~ Ga(q, $), 
the negative binomial case treated in Proposition 13.21, one finds that 


lee) p% 

P(T; >)=f eTA Lael eTA dA 
0 Tr (œ) 
po 


[0,6] 
= (t+ mal e Ss?! ds 
0 


~ Tœ) 
=p" (t+), +20, 
so that Tı has a Pareto distribution T) ~ Pa(a, 6) (see Section A.2.8). 


Processes with stochastic intensity. A further important class of models is obtained 
when à in the homogeneous Poisson case is replaced by a general stochastic pro- 
cess (à+), yielding a two-tier stochastic model or so-called doubly stochastic process. 

For example, one could take à; to be a diffusion or, alternatively, a finite-state 
Markov chain. The latter case gives rise to a regime-switching model: in each state of 
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Figure 13.9. Ten realizations of a mixed Poisson process with A ~ Ga(100, 1). 
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Figure 13.10. Realization of a counting process with 
a regime switch from à = 10 toa = 100. 


the Markov chain the intensity has a different constant level and the process remains 
in that state for an exponential length of time, before jumping to another state. 
In Figure 13.10 we have simulated the sample path of such a process randomly 
switching between A = 10 and A = 100. In Section 10.5.1 we looked at doubly 
stochastic random times, which correspond to the first jump of a doubly stochastic 
Poisson process. 


Notes and Comments 


The story behind the name insurance analytics is told in Embrechts (2002). A good 
place to start a search for actuarial literature is the website of the International Actu- 
arial Association: www.actuaries.org. Several interesting books can be found on 
the website of the Society of Actuaries, www.soa.org (whose offices happen to be 
on North Martingale Road, Schaumburg, Illinois). A standard Society of Actuaries 
textbook on actuarial mathematics is Bowers et al. (1986); financial economics for 
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actuaries is to be found in Panjer and Boyle (1998). For our purposes, excellent texts 
are Mikosch (2004) and Partrat and Besson (2004). A most readable and informative 
(online) set of lecture notes, with many historical links, is Wuthrich (2013). Rol- 
ski et al. (1999) gives a broad, more technical overview of the relevant stochastic 
process models. In Chapters 7 and 8 we have given several references to actuar- 
ial tools relevant to the study of risk measures, including distortion risk measures 
(Section 8.2.1), comonotonicity (Section 7.2.1) and Fréchet bounds (Sections 7.2.1 
and 8.4.4). Finally, an overview of the state of the art of actuarial modelling is to be 
found in Teugels and Sundt (2004). 

Actuarial textbooks dealing in particular with the modelling of loss distributions 
in insurance are Hogg and Klugman (1984), Klugman, Panjer and Willmot (2012) 
and Klugman, Panjer and Willmot (2013). Besides the general references above, 
an early textbook discussion of the use of numerical methods for the calculation of 
the df of total loss amount rvs is Feilmeier and Bertram (1987); Buhlmann (1984) 
contains a first comparison between the FFT method and Panjer recursion. More 
extensive comparisons, taking rounding and discretization errors into account, are 
found in Grubel and Hermesmeier (1999, 2000). A discussion of the use of the FFT 
in insurance is given in Embrechts, Gruebel and Pitts (1993). Algorithms for the 
FFT are freely available on the web, as a search will quickly reveal. The original 
paper by Panjer (1981) also contains a density version of Theorem 13.16. For an 
application of Panjer recursion to credit risk measurement within the CreditRisk* 
framework, see Credit Suisse Financial Products (1997). Based on Giese (2003), 
Haaf, Reiss and Schoenmakers (2004) propose an alternative recursive method. For 
more recent work on Panjer recursion, especially in the multivariate case, see, for 
example, Hesselager (1996) and Sundt (1999, 2000). For a textbook treatment of 
recursive methods in insurance, see Sundt and Vernic (2009). 

Asymptotic approximation methods going beyond the normal approximation 
(13.13) are known in statistics under the names Berry—Esséen, Edgeworth and 
saddle-point. The former two are discussed, for example, in Embrechts, Kliippelberg 
and Mikosch (1997) and are of more theoretical importance. The saddle-point tech- 
nique is very useful: see Jensen (1995) for an excellent summary, and Embrechts 
et al. (1985) for an application to compound distributions. Gordy (2002) discusses 
the importance of saddle-point methods for credit risk modelling, again within the 
context of CreditRisk*. Wider applications within risk management can be found 
in Studer (2001) and Glasserman (2004); Glasserman and Li (2003) 

Poisson mixture models with insurance applications in mind are summarized in 
Grandell (1997) (see also Bening and Korolev 2002). In order to delve more deeply 
into the world of counting processes, one has to study the theory of point processes. 
Very comprehensive and readable accounts are Daley and Vere-Jones (2003) and 
Karr (1991). A study of this theory is both mathematically demanding and practically 
rewarding. Such models are being used increasingly in credit risk. The notion of 
time change is fundamental to many applications in insurance and finance; for an 
example of how it can be used to model operational risk, see Embrechts, Kaufmann 
and Samorodnitsky (2004). For its introduction into finance, see Ané and Geman 


13.2. Elements of Insurance Analytics 535 


(2000) and Dacorogna et al. (2001). An excellent survey is to be found in Peeters 
(2004). 

What have we not included in our brief account of the elements of insurance 
analytics? We have not treated ruin theory and the general stochastic process theory 
of insurance risk, credibility theory, dynamic financial analysis, also referred to as 
dynamic solvency testing, or reinsurance, to name but a few omissions. 

The stochastic process theory of insurance risk has a long tradition. The first 
fundamental summary came through the pioneering work of Cramér (1994a,b). 
Buhlmann (1970) made the field popular to several generations of actuaries. This 
early work has now been generalized in every way possible. A standard textbook on 
ruin theory is Asmussen and Albrecher (2010). The modelling of large claims and 
its consequences for ruin estimates can be found in Embrechts, Kluppelberg and 
Mikosch (1997). 

Credibility theory concerns premium calculation for non-homogeneous portfolios 
and has a very rich history rooted in non-life insurance mathematics. Its basic con- 
cepts were first developed by American actuaries in the 1920s; pioneering papers 
in this early period were Mowbray (1914) and Whitney (1918). Further important 
work is found in the papers of Bailey (1945), Robbins (1955, 1964) and Buhlmann 
(1967, 1969, 1971). An excellent review article tracing the historical development 
of the basic ideas is Norberg (1979); see also Jewell (1990) for a more recent review. 
Various textbook versions exist: Buhlmann and Gisler (2005) give an authoritative 
account of its actuarial usage and hint at applications to financial risk management. 

Dynamic financial analysis, also referred to as dynamic solvency testing, is 
a systematic approach, based on large-scale computer simulations, for the inte- 
grated financial modelling of non-life insurance and reinsurance companies aimed 
at assessing the risks and benefits associated with strategic decisions (see Blum 
2005; Blum and Dacorogna 2004). An easy introduction can be found in Kauf- 
mann, Gadmer and Klett (2001). The interested reader can consult the website of 
the Casualty Actuarial Society (www.casact.org/research/drm). 


Part IV 


Special Topics 


14 


Multivariate Time Series 


In this chapter we consider multivariate time-series models for multiple series of 
financial risk-factor change data, such as differenced logarithmic price series. The 
presentation closely parallels the presentation of the corresponding ideas for uni- 
variate time series in Chapter 4. 

An introduction to concepts in the analysis of multivariate time series and a dis- 
cussion of multivariate ARMA models is found in Section 14.1, while Section 14.2 
presents some of the more important examples of multivariate GARCH models. 


14.1 Fundamentals of Multivariate Time Series 


Among the fundamentals of multivariate time series discussed in this section are the 
concepts of cross-correlation, stationarity of multivariate time series, multivariate 
white noise processes and multivariate ARMA models. 


14.1.1 Basic Definitions 


A multivariate time-series model for multiple risk factors is a stochastic process 
(X;)rez, 1.e. a family of random vectors, indexed by the integers and defined on 
some probability space (2, F, P). 


Moments of a multivariate time series. Assuming they exist, we define the mean 
function w(t) and the covariance matrix function T (t, s) of (X;)rez by 


w(t) = E(X;), teZ, 
r(t, s) = E(X: — w(t))(Xs — m(s))'), t,s €Z. 


Analogously to the univariate case, we have I (t, t) = cov(X;). By observing that 
the elements y;;(t, s) of I (t, s) satisfy 


vij(t, S) = cov (Xr, i, Xs,j) = cov(Xs, j, Xt) = yji (s, t), (14.1) 


it is clear that I(t, s) = T (s, t)' for all t, s. However, the matrix I need not be 
symmetric, so in general I(t, s) Æ T (s, t), which is in contrast to the univariate 
case. Lagged values of one of the component series can be more strongly correlated 
with future values of another component series than vice versa. This property, when 
observed in empirical data, is known as a lead-lag effect and is discussed in more 
detail in Example 14.7. 
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Stationarity. The multivariate models we consider will be stationary in one or both 
of the following senses. 


Definition 14.1 (strict stationarity). The multivariate time series (X;);<z is strictly 
stationary if 


d 
r Ke PSM ps teh Xg 


for all t),...,t%,k € Zand for all n € N. 


Definition 14.2 (covariance stationarity). The multivariate time series (X;);<z is 
covariance stationary (or weakly or second-order stationary) if the first two moments 
exist and satisfy 


M(t) = BM, teZ, 
r(t,s)=r(t+k,s+k), t,s,k €Z. 


Astrictly stationary multivariate time series with finite covariance matrix is covari- 
ance stationary, but we note that, as in the univariate case, it is possible to define 
infinite-variance processes (including certain multivariate ARCH and GARCH pro- 
cesses) that are strictly stationary but not covariance stationary. 


Serial correlation and cross-correlation in stationary multivariate time series. The 
definition of covariance stationarity implies that for all s, £ we have T(t — s, 0) = 
I(t, s), so that the covariance between X, and X, only depends on their temporal 
separation tf — s, which is known as the lag. In contrast to the univariate case, the 
sign of the lag is important. 

For a covariance-stationary multivariate process we write the covariance matrix 
function as a function of one variable: I (h) := F (h, 0), Wh € Z. Noting that I (0) = 
cov(X;), Yt, we can now define the correlation matrix function of a covariance- 
stationary process. For this definition we recall the operator A(-), defined in (6.4), 
which when applied to X = (0;;) € R¢* returns a diagonal matrix with the values 


/O11,---,/Odd on the diagonal. 


Definition 14.3 (correlation matrix function). Writing A := A(J"(0)), where 
A(-) is the operator defined in (6.4), the correlation matrix function P(h) of a 
covariance-stationary multivariate time series (X;);¢7, iS 


Pith) :=A'r(A)A!, VWheZ. (14.2) 


The diagonal entries ;; (h) of this matrix-valued function give the autocorrelation 
function of the ith component series (X;,;);<z. The off-diagonal entries give so- 
called cross-correlations between different component series at different times. It 
follows from (14.1) that P(h) = P(—h)’, but P(h) need not be symmetric, and in 
general P(h) Æ P(—h). 


White noise processes. As inthe univariate case, multivariate white noise processes 
are building blocks for practically useful classes of time-series model. 
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Definition 14.4 (multivariate white noise). (X;);ez is multivariate white noise if 
it is covariance stationary with correlation matrix function given by 


P, h= 
P(hy= 2’ 0, 
0, h#£O, 


for some positive-definite correlation matrix P. 


A multivariate white noise process with mean 0 and covariance matrix X = 
cov(X;) will be denoted by WN(0, X). Such a process has no cross-correlation 
between component series, except for contemporaneous cross-correlation at lag 
zero. A simple example is a series of iid random vectors with finite covariance 
matrix, and this is known as a multivariate strict white noise. 


Definition 14.5 (multivariate strict white noise). (X;);<z is multivariate strict 
white noise if it is a series of iid random vectors with finite covariance matrix. 


A strict white noise process with mean 0 and covariance matrix X will be denoted 
by SWN(O, X). 

The martingale-difference noise concept may also be extended to higher dimen- 
sions. As before we assume that the time series (X;);<7z is adapted to some filtration 
(F+), typically the natural filtration (o ({Xs: s < t})), which represents the infor- 
mation available at time t. 


Definition 14.6 (multivariate martingale difference). (X;);<z has the multi- 
variate martingale-difference property with respect to the filtration (¥;) if E|X;| < 
oo and 

E(X, | Fi-1) = 9, Vt eZ. 


The unconditional mean of such a process is obviously also zero and, if cov(X;) < 
oo for all ¢, the covariance matrix function satisfies I (t, s) = 0 fort # s. If the 
covariance matrix is also constant for all t, then a process with the multivariate 
martingale-difference property is a multivariate white noise process. 


14.1.2 Analysis in the Time Domain 


We now assume that we have a random sample X,,..., X, from a covariance- 
stationary multivariate time-series model (X;);ez. In the time domain we construct 
empirical estimators of the covariance matrix function and the correlation matrix 
function from this random sample. 
The sample covariance matrix function is calculated according to 
A i _ , 
r(h)= — X Xren —X)\(X,-X), O<h<n, 
n 

where X = >, X,/n is the sample mean, which estimates „w, the mean of the 
time series. Writing A:= A (0)), where A(-) is the operator defined in (6.4), the 
sample correlation matrix function is 


Pin) = A" Pn)AT!, OK<h <n. 


542 14. Multivariate Time Series 


The information contained in the elements 6;;(h) of the sample correlation 
matrix function is generally displayed in the cross-correlogram, which is ad x d 
matrix of plots (see Figure 14.1 for an example). The ith diagonal plot in this 
graphical display is simply the correlogram of the ith component series, given by 
{(h, pii(h)): h = 0,1, 2,...}. For the off-diagonal plots containing the estimates 
of cross-correlation there are various possible presentations and we will consider 
the following convention: fori < j we plot {(h, bij (h)): h = 0,1,2,...}; for 
i > j we plot {(—h, bij (h)): h = 0,1, 2,...}. An interpretation of the meaning of 
the off-diagonal pictures is given in Example 14.7. 

It can be shown that for causal processes driven by multivariate strict white noise 
innovations (see Section 14.1.3) the estimates that comprise the components of the 
sample correlation matrix function Ê (h) are consistent estimates of the underlying 
theoretical quantities. For example, if the data themselves are from an SWN process, 
then the cross-correlation estimators ĝ;; (h) for h # 0 converge to zero as the sample 
size is increased. However, results concerning the asymptotic distribution of cross- 
correlation estimates are, in general, more complicated than the univariate result for 
autocorrelation estimates given in Theorem 4.13. Some relevant theory is found in 
Chapter 11 of Brockwell and Davis (1991) and Chapter 7 of Brockwell and Davis 
(2002). It is standard to plot the off-diagonal pictures with Gaussian confidence 
bands at (—1.96./n, 1.96,/n), but these bands should be used as rough guidance 
for the eye rather than being relied upon too heavily to draw conclusions. 


Example 14.7 (cross-correlogram of trivariate index returns). In Figure 14.1 the 
cross-correlogram of daily log-returns is shown for the Dow Jones, Nikkei and Swiss 
Market indices for 26 July 1996-25 July 2001. Although every vector observation in 
this trivariate time series relates to the same trading day, the returns are of course not 
properly synchronized due to time zones. This picture therefore shows interpretable 
lead-lag effects that help us to understand the off-diagonal pictures in the cross- 
correlogram. 

Part (b) of the figure shows estimated correlations between the Dow Jones index 
return on day ¢ + h and the Nikkei index return on day ¢, for h > 0; these estimates 
are clearly small and lie mainly within the confidence band, with the obvious excep- 
tion of the correlation estimate for returns on the same trading day P1>(0) 7x 0.14. 
Part (d) shows estimated correlations between the Dow Jones index return on day 
t + h and the Nikkei index return on day t, for h < 0; the estimate corresponding to 
h = —1 is approximately 0.28 and can be interpreted as showing how the American 
market leads the Japanese market. Comparing parts (c) and (g) we see, unsurpris- 
ingly, that the American market also leads the Swiss market, so that returns on day 
t — | in the former are quite strongly correlated with returns on day f in the latter. 


14.1.3 Multivariate ARMA Processes 


We provide a brief excursion into multivariate ARMA models to indicate how 
the ideas of Section 4.1.2 generalize to higher dimensions. For daily data, captur- 
ing multivariate ARMA effects is much less important than capturing multivariate 
volatility effects (and dynamic correlation effects) through multivariate GARCH 
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Figure 14.1. Cross-correlogram of daily log-returns of the Dow Jones, Nikkei and Swiss 
Market indices for 26 July 1996-25 July 2001 (see Example 14.7 for commentary). 


modelling, but, for longer-period returns, the more traditional ARMA processes 
become increasingly useful. In the econometrics literature they are more commonly 
known as vector ARMA (or VARMA) processes. 


Definition 14.8 (VARMA process). Let (€;);cz be WN(0, X). The process 
(X;)rez is a zero-mean VARMA (p, q) process if it is a covariance-stationary pro- 
cess satisfying difference equations of the form 


Xt — ®X;-) — +++ — Op Xi p = €r + Oleri +++ + Oger-q, Vt €Z, 


for parameter matrices ®; and ©; in R¢*4_ (X;) is a VARMA process with mean 
H if the centred series (X; — M)rez is a zero-mean VARMA (p, q) process. 


Consider a zero-mean VARMA (p, q) process. For practical applications we again 
consider only causal processes (see Section 4.1.2), which are processes where the 
solution of the defining equations has a representation of the form 


CO 
X=} Vieni, (14.3) 
i 
where (Wi )icNo iS a sequence of matrices in R¢*¢ with absolutely summable com- 


ponents, meaning that, for any j and k, 


CO 
YO Iyi, jel < 0. (14.4) 
i=0 
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As in the univariate case (see Proposition 4.9) it can be verified by direct calculation 
that such linear processes are covariance stationary. For h > 0 the covariance matrix 
function is given by 


[0,6] lee) 
r(t +h, t) = cov(X; +h, X) = (> WiEtth-i Deye) 
i=0 j=0 


Arguing much as in the univariate case it is easily shown that this depends only on 
h and not on ¢ and that it is given by 


OO 
Tr (h) = 5 WiirbeW/, h=0,1,2,.... (14.5) 
i=0 


The correlation matrix function is easily derived from (14.5) and (14.2). 

The requirement that a VARMA process be causal imposes conditions on the 
values that the parameter matrices ®; (in particular) and ©; may take. The theory is 
quite similar to univariate ARMA theory. We will give a single useful example from 
the VARMA class; this is the first-order vector autoregressive (or VAR(1)) model. 


Example 14.9 (VAR(1) process). The first-order VAR process satisfies the set of 
vector difference equations 


X; = DX + &;, Vt. (14.6) 


It is possible to find a causal process satisfying (14.3) and (14.4) that is a solution 
of (14.6) if and only if all eigenvalues of the matrix ® are less than 1 in absolute 
value. The causal process 


0O 
X, = 5 piei (14.7) 
=0 


is then the unique solution. This solution can be thought of as an infinite-order vec- 
tor moving-average process, a so-called VMA(oo) process. The covariance matrix 
function of this process follows from (14.3) and (14.5) and is given by 


[0.0] 
Tr (h) = Ss pithy p” = prO), h=0,1,2,.... 
i=0 
In practice, full VARMA models are less common than models from the VAR 
and VMA subfamilies, one reason being that identifiability problems arise when 
estimating parameters. For example, we can have situations where the first-order 
VARMA(I, 1) model X;—@ X;_1 = e,+Oe;_1 canbe rewritten as X;-—®* X;_1 = 
&;+0*e,_, for completely different parameter matrices #* and ©* (see Tsay (2002, 
p. 323) for an example). Of the two subfamilies, VAR models are easier to estimate. 
Fitting options for VAR models range from multivariate least-squares estimation 
without strong assumptions concerning the distribution of the driving white noise, 
to full ML estimation; models combining VAR and multivariate GARCH features 
can be estimated using a conditional ML approach in a very similar manner to that 
described for univariate models in Section 4.2.4. 
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Notes and Comments 


Many standard texts on time series also handle the multivariate theory: see, for 
example, Brockwell and Davis (1991, 2002) or Hamilton (1994). A key reference 
aimed at an econometrics audience is Lütkepohl (1993). For examples in the area 
of finance see Tsay (2002) and Zivot and Wang (2003). 


14.2 Multivariate GARCH Processes 


In this section we first establish general notation for multivariate GARCH (or 
MGARCH) models before going on to consider models that are defined via their 
conditional correlation matrix in Section 14.2.2 and models that are defined via 
their conditional covariance matrix in Section 14.2.4. We also provide brief notes 
on model estimation, strategies for dimension reduction, and the use of multivariate 
GARCH models in quantitative risk management. 


14.2.1 General Structure of Models 


For the following definition we recall that the Cholesky factor X 1/2 


definite matrix X is the lower-triangular matrix satisfying AA’ = X (see the dis- 
cussion at end of Section 6.1). 


Definition 14.10. Let (Z,),<z be SWN(0, I4). The process (X;);<z is said to be a 
multivariate GARCH process if it is strictly stationary and satisfies equations of the 
form 


of a positive- 


Ga Zs eZ (14.8) 


where 5 * e RI*4 is the Cholesky factor of a positive-definite matrix X, that is 
measurable with respect to F;-1 = o ({X,: s < t — 1}), the history of the process 
up to time ¢t — 1. 


Conditional moments. It is easily calculated that a covariance-stationary process 


of this type has the multivariate martingale-difference property 


E(X, | Fi-1) = E(X}? Z | F1) = XIP E(Z,) = 0, 


and it must therefore be a multivariate white noise process, as argued in Section 14.1. 
Moreover, X, is the conditional covariance matrix since 


cov(X; | Fr-1) = E(X; X, | Fi—1) 


= D; EZZ Y 
= Bly ly 
= Sy (14.9) 


The conditional covariance matrix X; in a multivariate GARCH model corresponds 
to the squared volatility of in a univariate GARCH model. The use of the Cholesky 
factor of X, to describe the relationship to the driving noise in (14.8) is not impor- 
tant, and in fact any type of “square root” of X, could be used (such as the root 
derived from a symmetric decomposition). (The only practical consequence is that 
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different choices would lead to different series of residuals when the model is fit- 
ted in practice.) We denote the elements of X; by o;,;; and also use the notation 
01,i = „/Or,ii to denote the conditional standard deviation (or volatility) of the ith 
component series (X;,;)+ez. 

We recall that we can write X, = A;P;A;, where 


Ay = A(X) = diag(o;,1, Desy Otd), P; = go (2), (14.10) 


using the operator notation defined in (6.4) and (6.5). The diagonal matrix A; is 
known as the volatility matrix and P; is known as the conditional correlation matrix. 
The art of building multivariate GARCH models is to specify the dependence of X, 
(or of A; and P;) on the past in such a way that X, always remains symmetric and 
positive definite. A covariance matrix must of course be symmetric and positive 
semidefinite, and in practice we restrict our attention to the positive-definite case. 
This facilitates fitting, since the conditional distribution of X, | F,—ı never has a 
singular covariance matrix. 


Unconditional moments. The unconditional covariance matrix X of a process of 
this type is given by 


X = cov(X;) = E(cov(X; | F;-1)) + cov(E(X; | Fi-1)) = E(2%), 


from which it can be calculated that the unconditional correlation matrix P has 


elements 
E(o;,ij) — E(61,i7 0,1, j) 


F VEO ii) Elo, jj) . JEC?)E@?,) 


which is in general difficult to evaluate and is usually not simply the expectation of 
the conditional correlation matrix. 


(14.11) 


Pij 


Innovations. In practical work the innovations are generally taken to be from 
either a multivariate Gaussian distribution (Z; ~ Na(0, I4)) or, more realisti- 
cally for daily returns, an appropriately scaled spherical multivariate ¢ distribution 
(Zi ~ ta(v, 0, (v — 2)Ig/v)). Any distribution with mean 0 and covariance matrix 
Iq is permissible, and appropriate members of the normal mixture family of Sec- 
tion 6.2 or the spherical family of Section 6.3.1 may be considered. 


Presentation of models. In the following sections we present some of the more 
important multivariate GARCH specifications. In doing this we concentrate on the 
following aspects of the models. 


e The form of the dynamic equations, with economic arguments and criticisms 
where appropriate. 


e The conditions required to guarantee that the conditional covariance matrix 
X, remains positive definite. Other mathematical properties of these mod- 
els, such as conditions for covariance stationarity, are difficult to derive with 
full mathematical rigour; references in Notes and Comments contain further 
information. 
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e The parsimoniousness of the parametrization. A major problem with most 
multivariate GARCH specifications is that the number of parameters tends to 
explode with the dimension of the model, making them unsuitable for analyses 
of many risk factors. 


e Simple intuitive fitting methods where available. All models can be fitted 
by a general global-maximization approach described in Section 14.2.4 but 
certain models lend themselves to estimation in stages, particularly the models 
of Section 14.2.2. 


14.2.2 Models for Conditional Correlation 


In this section we present models that focus on specifying the conditional correlation 
matrix P, while allowing volatilities to be described by univariate GARCH models; 
we begin with a popular and relatively parsimonious model where P, is assumed to 
be constant for all t. 


Constant conditional correlation (CCC). 


Definition 14.11. The process (X;);-z is a CCC-GARCH process if it is a pro- 
cess with the general structure given in Definition 14.10 such that the conditional 
covariance matrix is of the form X, = A; P.A;, where 


(i) Pe is a constant, positive-definite correlation matrix; and 


(ii) A; is a diagonal volatility matrix with elements o; ; satisfying 
Pk qk 
Of, = Ako +Y iX ip + D> Beier jg, k=1,...,d, (14.12) 
i=l j=l 


where œzo > 0, œki 2 0,i = 1,..., Pk, kj 29, j= 1,..., qk. 


The CCC-GARCH specification represents a simple way of combining univariate 
GARCH processes. This can be seen by observing that in a CCC-GARCH model 
observations and innovations are connected by equations X; = A,P! / FAA which 
may be rewritten as X; = AZ for an SWN(O, P.) process (Zr)rez- Clearly, the 
component processes are univariate GARCH. 


Proposition 14.12. The CCC-GARCH model is well defined in the sense that X, 
is almost surely positive definite for all t. Moreover, it is covariance stationary if 
and only if X; oi + O48 Buy < 1 fork =1,...,d. 


i= 
Proof. For a vector v Æ 0 in R? we have 
v Xv = (Avy P: (Arv) > 0, 


since P, is positive definite and the strict positivity of the individual volatility pro- 
cesses ensures that A;v Æ 0 for all t. 

If (X+);ez is covariance stationary, then each component series (X; k)rez iS a 
covariance-stationary GARCH process for which a necessary and sufficient condi- 
tion is par ey aki + Dees 1 Êkj < 1 by Proposition 4.21. Conversely, if the component 
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series are covariance stationary, then for all i and j the Cauchy—Schwarz inequality 


implies 
aij = E(6r,ij) = pif E (1,101, ) < Pij Elfi) y EG? ;) < 00. 


Since (X;)+ez is a multivariate martingale difference with finite, non-time-dependent 
second moments o;;, it is a covariance-stationary white noise. 


The CCC model is often a useful starting point from which to proceed to more 
complex models. In some empirical settings it gives an adequate performance, but it 
is generally accepted that the constancy of conditional correlation in this model is an 
unrealistic feature and that the impact of news on financial markets requires models 
that allow a dynamic evolution of conditional correlation as well as a dynamic 
evolution of volatilities. A further criticism of the model (which in fact applies 
to the majority of MGARCH specifications) is the fact that the individual volatility 
dynamics (14.12) do not allow for the possibility that large returns in one component 
series at a particular point in time can contribute to the increased volatility of another 
component time series at future points in time. 

To describe a simple method of fitting the CCC model, we introduce the notion of 
a devolatized process. For any multivariate time-series process X;, the devolatized 
process is the process Y; = A; DE +, Where A, is, as usual, the diagonal matrix of 
volatilities. In the case of a CCC model it is easily seen that the devolatized process 
(Y; )rez is an SWN(0, Pc) process. 

This structure suggests a simple two-stage fitting method in which we first estimate 
the individual volatility processes for the component series by fitting univariate 
GARCH processes to data X1,..., Xn; note that, although we have specified in 
Definition 14.11 that the individual volatilities should follow standard GARCH 
models, we could of course extend the model to allow any of the univariate models 
in Section 4.2.3 to be used, such as GARCH with leverage or threshold GARCH. 

In a second stage we construct an estimate of the devolatized process by taking 
f, = Â X, fort = 1,..., n, where Ay is the estimate of A;; in other words, we 
collect the standardized residuals from the univariate GARCH models. Note that we 
have also described this construction in slightly more general terms in the context 
of a multivariate approach to dynamic historical simulation in Section 9.2.4. If the 
CCC—GARCH model is adequate, then the Y, data should behave like a realization 
from an SWN(0, P.) process, and this can be investigated with the correlogram 
and cross-correlogram applied to raw and absolute values. Assuming that the model 
is adequate, the conditional correlation matrix P, can then be estimated from the 
standardized residuals using methods from Chapter 6. 

A special case of CCC-GARCH that we call a pure diagonal model occurs when 
P, = Ig. A covariance-stationary model of this kind is a multivariate white noise 
where the contemporaneous components X;; and X;,; are also uncorrelated for 
i # j. This subsumes the case of independent GARCH models for each component 
series. Indeed, if we assume that the driving SWN(0, Jz) process is multivariate 
Gaussian, then the component series are independent. However, if, for example, we 
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assume that the innovations satisfy Z; ~ ta (v, 0, ((v—2)/v)Zq), then the component 
processes are dependent. 


Dynamic conditional correlation (DCC). This model generalizes the CCC model 
to allow conditional correlations to evolve dynamically according to a relatively 
parsimonious scheme, but it is constructed in a way that still allows estimation in 
stages using univariate GARCH models. Its formal analysis as a stochastic process is 
difficult due to the use of the correlation matrix extraction operator g in its definition. 


Definition 14.13. The process (X;)rez is a DCC-GARCH process if it is a process 
with the general structure given in Definition 14.10, where the volatilities compris- 
ing A; follow univariate GARCH specifications as in (14.12) and the conditional 
correlation matrices P, satisfy, for t € Z, the equations 


P q P q 
P, = o( (1 -J a- D)e: +o oY,iY/ + SPs), (14.13) 
i=l j=l i=1 j=l 


where P, is a positive-definite correlation matrix, go is the operator in (6.5), Y, = 
A7! X denotes the devolatized process, and the coefficients satisfy œ; > 0, p; > 0 
and X? aj — Ee bee. 

Observe first that if all the œ; and £; coefficients in (14.13) are zero, then the model 
reduces to the CCC model. If one makes an analogy with a covariance-stationary 
univariate GARCH model with unconditional variance o”, for which the volatility 
equation can be written 


Pp q p 4 
ial a La? at ace 
i=l j=l i=l j=l 


then the correlation matrix P, in (14.13) can be thought of as representing the long- 
run correlation structure. Although this matrix could be estimated by fitting the 
DCC model to data by ML estimation in one step, it is quite common to estimate it 
using an empirical correlation matrix calculated from the devolatized data, as in the 
estimation of the CCC model. 

Observe also that the dynamic equation (14.13) preserves the positive definiteness 
of P;. If we define 


p q P 4 
i=1 j=1 i=1 j=l 


and assume that P;_,, ..., P;—1 are positive definite, then it follows that, for a vector 
v Æ0in R7, we have 


P q Pp q 
v Qw = (1 = So ai = DA VP+ aww Hh iY w+ byv Pijo > 0, 
i=l jel i=l jel 


since the first term is strictly positive and the second and third terms are non-negative. 
If Q; is positive definite, then so is P;. 
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The usual estimation method for the DCC model is as follows. 


(1) Fit univariate GARCH-type models to the component series to estimate the 
volatility matrix A;. Form an estimated realization of the devolatized process 
by taking Y; = A,X}. 


2) Estimate P, by taking the sample correlation matrix of the devolatized data 
y 8 p 
(or, better still, some robust estimator of correlation). 


(3) Estimate the remaining parameters œ; and 6; in equation (14.13) by fitting 
a model with structure Y, = pi! Z, to the devolatized data. We leave this 
step vague for the time being and note that this will be a simple applica- 
tion of the methodology for fitting general multivariate GARCH models in 
Section 14.2.4; in a first-order model (p = q = 1), there will be only two 
remaining parameters to estimate. 


14.2.3 Models for Conditional Covariance 


The models of this section specify explicitly a dynamic structure for the conditional 
covariance matrix X,. These models are not designed for multiple-stage estimation 
based on univariate GARCH estimation procedures. 


Vector GARCH models (VEC and DVEC). The most general vector GARCH 
model—the VEC model—has too many parameters for practical purposes and our 
task will be to simplify the model by imposing various restrictions on parameter 
matrices. 


Definition 14.14 (VEC model). The process (X;)rez is a VEC process if it has 
the general structure given in Definition 14.10, and the dynamics of the conditional 
covariance matrix X; are given by the equations 


P q 
vech(;) = ao + J Aj vech(X,—;X;_;) + J. Bj vech(Z_;) (14.14) 
i=1 j=l 


for a vector ay € R?®+D/2 and matrices A; and B; in REE+)/2)x@@+))/2) | 


In this definition “vech” denotes the vector half operator, which stacks the 
columns of the lower triangle of a symmetric matrix in a single column vector 
of length d(d + 1)/2. Thus (14.14) should be understood as specifying the dynam- 
ics for the lower-triangular portion of the conditional covariance matrix, and the 
remaining elements of the matrix are determined by symmetry. 

In this very general form the model has (1 + (p + q)d(d + 1)/2)d(d + 1)/2 
parameters; this number grows rapidly with dimension so that even a trivariate model 
has 78 parameters. The most common simplification has been to restrict attention 
to cases when A; and B j are diagonal matrices, which gives us the diagonal VEC 
or DVEC model. This special case can be written very elegantly in terms of a 
different kind of matrix product, namely the Hadamard product, denoted by “o”, 
which signifies element-by-element multiplication of two matrices of the same size. 
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We obtain the representation 


p q 
>= Ao + >> Ai o (X-i Xj) +) Bj o Xr—j, (14.15) 
i=1 j=1 
where Ag and the A; and B j must all be symmetric matrices in R¢*4 such that Ao has 
positive diagonal elements and all other matrices have non-negative diagonal ele- 
ments (standard univariate GARCH assumptions). This representation emphasizes 
structural similarities with the univariate GARCH model of Definition 4.20. 
To better understand the dynamic implications of (14.15), consider a bivariate 
model of order (1, 1) and write aọ,ij, a1,ij and b;; for the elements of Ag, A; and 
B, respectively. The model amounts to the following three simple equations: 


2 2 2 
ofi = 40. + a1 Xp + uO 1 1 


01,12 = 40,12 + 41,12X1—-1,1 X1-1,2 + 61204-1,12, (14.16) 


2 2 2 
ofa = 40,22 + 41,22 X12 + b2201,2 


The volatilities of the two component series (0;,; and o;,2) follow univariate GARCH 
updating patterns, and the conditional covariance 0; 12 has a similar structure driven 
by the products of the lagged values X;_ 1,1 X;—1,2. As for the CCC and DCC models, 
the volatility of a single component series is only driven by large lagged values of 
that series and cannot be directly affected by large lagged values in another series; 
the more general but overparametrized VEC model would allow this feature. 

The requirement that X; in (14.15) should be a proper positive-definite covariance 
matrix does impose conditions on the Ag, A; and B; matrices that we have not yet 
discussed. In practice, in some software implementations of this model, formal 
conditions are not imposed, other than that the matrices should be symmetric with 
non-negative diagonal elements; the positive definiteness of the resulting estimates 
of the conditional covariance matrices can be checked after model fitting. 

However, a sufficient condition for X, to be almost surely positive definite is that 
Ao should be positive definite and the matrices Aj,..., Ap, Bi,..., Bq should all 
be positive semidefinite (see Notes and Comments), and this condition is easy to 
impose. We can constrain all parameter matrices to have a form based on a Cholesky 
decomposition; that is, we can parametrize the model in terms of lower-triangular 


Cholesky factor matrices ay a Al /2 and By * such that 


Ap = Ag (Ag, A= AAP, Bp = BBP. (14.17) 
Because the sufficient condition only prescribes that Aj,..., Ap and B1, ..., By 
should be positive semidefinite, we can in fact also consider much simpler 
parametrizations, such as 


1/2 


Ay = Ag (A0), Ai = aial, Bj = by, (14.18) 


where a; and b; are vectors in R7. An even cruder model, satisfying the requirement 
of positive definiteness of 2’, would be 


Ao = AP (APY, At = aida, By = Dilla, (14.19) 
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where a; and bj are simply positive constants. In fact, the specifications of the multi- 
variate ARCH and GARCH effects in (14.17)-(14.19) can be mixed and matched 
in obvious ways. 


The BEKK model of Baba, Engle, Kroner and Kraft. The next family of models 
have the great advantage that their construction ensures the positive definiteness of 
X, without the need for further conditions. 


Definition 14.15. The process (X;);-z is a BEKK process if it has the general 
structure given in Definition 14.10 and if the conditional covariance matrix X; 
satisfies, for all t € Z, 


p q 
X; = Ao + J AX, -iX}_jAi + J BY X—jBj, (14.20) 
i=1 j=l 
where all coefficient matrices are in R¢*¢ and Ag is symmetric and positive definite. 


Proposition 14.16. In the BEKK model (14.20), the conditional covariance matrix 
>; is almost surely positive definite for all t. 


Proof. Consider a first-order model for simplicity. For a vector v 4 0 in R? we 
have 

v D,v = v' Agu + (vA, X;_-1)* + (Biv)! 5,1 (Biv) > 0, 
since the first term is strictly positive and the second and third terms are non- 
negative. 


To gain an understanding of the BEKK model it is again useful to consider the 
bivariate special case of order (1, 1) and to consider the dynamics that are implied 
while comparing these with equations (14.16): 


2 2 y2 2 y2 
Of 1 = 40,11 +41 11X1, + 241,1141,12X1-1,1X1-1,2 + Ay 12X-1,2 
22 22 
+ by 074) + 2b11b1201-1,12 + b]20;_ 12, (14.21) 


07,12 = 40,12 + (41,1141,22 + 41,1241,21) X1—-1,1 X1-1,2 
2 2 
+ a1,1141,21X/—1,1 + 41,2241, 12X7_1 9 
2 2 
+ (birba + bi2b21)01-1,12 + biib210;_ 1.) + b22b120f/_1,2, (14.22) 


2 2 %2 zi 
ofa = 0,22 + AY 2X;-1,2 + 241,2241,21X1-1,1 Xt-1,2 + 442) X71, 
igs?) 2 2 
+ b3907 2 + 2b22b2101-1,21 + b510 1,1 (14.23) 


From (14.21) it follows that we now have a model where a large lagged value of 
the second component X;_1,2 can influence the volatility of the first series o;,1. The 
BEKK model has more parameters than the DVEC model and appears to have much 
richer dynamics. Note, however, that the DVEC model cannot be obtained as a spe- 
cial case of the BEKK model as we have defined it. To eliminate all crossover effects 
in the conditional variance equations of the BEKK model in (14.21) and (14.23), 
we would have to set the diagonal terms a1,12, 41,21, b12 and b2; to be zero and the 
parameters governing the individual volatilities would also govern the conditional 
covariance o;,12 in (14.22). 
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Table 14.1. Summary of numbers of parameters in various multivariate GARCH models: in 
CCC it is assumed that the numbers of ARCH and GARCH terms for all volatility equations 
are, respectively, p and q; in DCC it is assumed that the conditional correlation equation has 
p +q parameters. The second column gives the general formula; the final columns give the 
numbers for models of dimensions 2, 5 and 10 when p = g = 1. Additional parameters in 
the innovation distribution are not considered. 


Model Parameter count 2 5 10 
VEC d(d+1)\(1 + (p+q)d(d + 1)/2)/2 21 465 6105 
BEKK d(d + 1)/2 +d? (p +q) 11 65 255 
DVEC as in (14.17) d(d+1)(1 + p+q)/2 9 45 165 
DCC d(d + 1)/2+ (d+1)\(p+4) 9 27 77 
CCC d(d+1)/2+d(p+q) 7 25 75 
DVEC as in (14.18) d(d + 1)/2+d(p +4) 7 25 75 
DVEC as in (14.19) d(d+1)/2+(p+q) 5 17 57 


Remark 14.17. A broader definition of the BEKK class, which does subsume all 
DVEC models, was originally given by Engle and Kroner (1995). In this definition 
we have 


K p K q 
p= AoAp + > 5 Ap X1-iX}_j Aki + 5 5 By j Xi- j By, j» 
k=l i=l k=1 j=1 


where 5d (d+1) > K = 1 andthe choice of K determines the richness of the model. 
This model class is of largely theoretical interest and tends to be too complex for 
practical applications; even the case K = 1 is difficult to fit in higher dimensions. 


In Table 14.1 we have summarized the numbers of parameters in these models. 
Broad conclusions concerning the practical implications are as follows: the general 
VEC model is of purely theoretical interest; the BEKK and general DVEC models 
are for very low-dimensional use; and the remaining models are the most practically 
useful. 


14.2.44 Fitting Multivariate GARCH Models 


Model fitting. We have already given notes on fitting some models in stages and 
it should be stressed that in the high-dimensional applications of risk management 
this may in fact be the only feasible strategy. Where interest centres on a multivariate 
risk-factor return series of more modest dimension (fewer than ten dimensions, say), 
we can attempt to fit multivariate GARCH models by maximizing an appropriate 
likelihood with respect to all parameters in a single step. The procedure follows 
from the method for univariate time series described in Section 4.2.4. 

The method of building a likelihood for a generic multivariate GARCH model 
X= z . Z; is completely analogous to the univariate case; consider again a first- 
order model (p = g = 1) for simplicity and assume that our data are labelled 
Xo, X1,..., Xn. A conditional likelihood is based on the conditional joint density 
of X1,..., Xn given Xo and an initial value Xọ for the conditional covariance 
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matrix. This conditional joint density is 


IX ,...,Xn|Xo, 20 (X15 -- -> Xn | Xo, Xo) 


n 
=| [| xax NOn Xo, No (Xt | Xt—-1,---5X0, Xo). 
t=1 


If we denote the multivariate innovation density of Z; by g(z), then we have 
a -1/2 
XIX. X0, Ep Ht | Xr-1; «++ 5X0, Do) = |X 7 g2; Px), 


where X, is a matrix-valued function of x;_1, . . . , xo and Xo. Most common choices 
of g(z) are in the spherical family, so by (6.39) we have g(z) = h(z’z) for some 
function A of a scalar variable (known as a density generator), yielding a conditional 
likelihood of the form 


n 
LO; X1,...,Xn) = | [1E PRX; a! KX), 
t=1 
where all parameters appearing in the volatility equation and the innovation distri- 
bution are collected in 0. It would of course be possible to add a constant mean term 
or a conditional mean term with, say, vector autoregressive structure to the model 
and to adapt the likelihood accordingly. 

Evaluation of the likelihood requires us to input a value for Xo. Maximization 

can again be performed in practice using a modified Newton—Raphson procedure, 
such as that of Berndt et al. (1974). References concerning properties of estimators 
are given in Notes and Comments, although the literature for multivariate GARCH 
is small. 
Model checking and comparison. Residuals are calculated according to Ż, = 
£" 2 ; and should behave like a realization of an SWN(0, I4) process. The usual 
univariate procedures (correlograms, correlograms of absolute values, and portman- 
teau tests such as Ljung—Box) can be applied to the component series of the residuals. 
Also, there should not be any evidence of cross-correlations at any lags for either 
the raw or the absolute residuals in the cross-correlogram. 

Model selection is usually performed by a standard comparison of AIC numbers, 
although there is limited literature on theoretical aspects of the use of the AIC in a 
multivariate GARCH context. 


14.2.5 Dimension Reduction in MGARCH 


In view of the large number of parameters in MGARCH models and the practical 
difficulties involved in their estimation, as well as the fact that many time series of 
risk-factor changes are likely to have high levels of cross-correlation at lag zero, it 
makes sense to consider how dimension-reduction strategies from the area of factor 
modelling (see Section 6.4.1) can be applied in the context of multivariate GARCH. 

As discussed in Section 6.4.2 there are a number of different statistical approaches 
to building a factor model. We give brief notes on strategies for constructing (1) a 
macroeconomic factor model and (2) a statistical factor model based on principal 
component analysis. 
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Macroeconomic factor model. In this approach we attempt to identify a small 
number of common factors F; that can explain the variation in many risk factors X;; 
we might, for example, use stock index returns to explain the variation in individual 
equity returns. These common factors could be modelled using relatively detailed 
multivariate models incorporating ARMA and GARCH features. 

The dependence of the individual returns on the factor returns can then be mod- 
elled by calibrating a factor model of the type 


X: =a +BF;, + &, t=1,...,n. 


In Section 6.4.3 we showed how this may be done in a static way using regression 
techniques. We now assume that, conditional on the factors F;, the errors e€; form a 
multivariate white noise process with GARCH volatility structure. 

In a perfect factor model (corresponding to Definition 6.31) these errors would 
have a diagonal covariance matrix and would be attributable to idiosyncratic effects 
alone. In GARCH terms we could assume they follow a pure diagonal model, i.e. a 
CCC model where the constant conditional correlation matrix is the identity matrix. 
A pure diagonal model can be fitted in two ways, which correspond to the two ways 
of estimating a static regression model in Section 6.4.3. 


(1) Fit univariate models to the component series X14, ..., Xn k = 1,...,d. 
For each k assume that 


Xt,k = Uk + Et,k, Utk = a +b, F;, ae eee (F 


where the errors £s, follow some univariate GARCH specification. Most 
software for GARCH estimation allows covariates to be incorporated into the 
model for the conditional mean term ju; x. 


(2) Fit in one step the multivariate model 
X= hite, by =at+BF,, t=1,...,n, 


where the errors e; follow a pure diagonal CCC model and the SWN(0, I4) 
process driving the GARCH model is some non-Gaussian spherical distribu- 
tion, such as an appropriate scaled ¢ distribution. (If the SWN is Gaussian, 
approaches (1) and (2) give the same results.) 


In practice, it is never possible to find the “right” common factors such that the 
idiosyncratic errors have a diagonal covariance structure. The pure diagonal assump- 
tion can be examined by looking at the errors from the GARCH modelling, esti- 
mating their correlation matrix and assessing its closeness to the identity matrix. In 
the case where correlation structure remains, the formal concept of the factor model 
can be loosened by allowing errors with a CCC-GARCH structure, which can be 
calibrated by two-stage estimation as described in Section 14.2.2. 
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Principal components GARCH (PC-GARCH). In Section 6.4.5 we explained the 
application of principal component analysis to time-series data under the assump- 
tion that the data came from a stationary multivariate time-series process. The idea 
of principal components GARCH is that we first compute time series of sample 
principal components and then model the most important series—the ones with 
highest sample variance—using GARCH models. Usually we attempt to fit a series 
of independent univariate GARCH models to the sample principal components. 

In terms of an underlying model we are implicitly assuming that the data are 
realizations from a so-called PC-GARCH or orthogonal GARCH model defined as 
follows. 


Definition 14.18. The process (X;);-z follows a PC-GARCH (or orthogonal 
GARCH) model if there exists some orthogonal matrix Il € R?*@ satisfying 
I'l’ = I'T = l4 such that (1 X;);ez follows a pure diagonal GARCH model. 
The process (X;);cz follows a PC-GARCH model with mean p if (X; — #);ez 
follows a PC-GARCH model. 


If (X1)+ez follows a PC-GARCH process for some matrix J”, then we can intro- 
duce the process (Y;);¢z, defined by Y, = I’'X;, which satisfies Y, = A; Z;, where 
(Z;)1ez is SWN(O, I4) and A; is a (diagonal) volatility matrix with elements that 
are updated according to univariate GARCH schemes and past values of the com- 
ponents of Y;. Since X; = I’A;Z;, the conditional and unconditional covariance 
matrices have the structure 


x, = rA", X =T TE(AÐI", (14.24) 


and they are obviously symmetric and positive definite. 

Comparing (14.24) with (6.59) we see that the PC-GARCH model implies a 
spectral decomposition of the conditional and unconditional covariance matrices. 
The eigenvalues of the conditional covariance matrix, which are the elements of the 
diagonal matrix A?, are given a GARCH updating structure. The eigenvectors form 
the columns of I and are used to construct the time series (Y;)rez, the principal 
components transform of (X;);<z. It should be noted that despite the simple structure 
of (14.24), the conditional correlation matrix of X; is not constant in this model. 

As noted above, this model is estimated in two stages. Let us suppose we have 
risk-factor change data X,,..., X, assumed to come from a PC-GARCH model 
with mean p. In the first step we calculate the spectral decomposition of the sam- 
ple covariance matrix S of the data as in Section 6.4.5; this gives us an esti- 
mator G of I’. We then rotate the data to obtain sample principal components 
{Y, = G' (X, — X): t = 1, ..., n}, where X =n7! Y7"_, X;. These should be con- 
sistent with a pure diagonal model if the PC-GARCH model is appropriate for the 
original data; there should be no cross-correlation between the series at any lag. Ina 
second stage we fit univariate GARCH models to the time series of principal com- 
ponents; the residuals from these GARCH models should behave like SWN(O, Jz). 

To turn the model output into a factor model we use the idea embodied in equa- 
tion (6.65)—that the first k loading vectors in the matrix G specify the most important 
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principal components—and we write these columns in the submatrix G, € R@**. 
We define the factors to be F; := (Y%.1,..-, Yp), t = 1,..., n, the first k principal 
component time series. We have now calibrated an approximate factor model of the 
form X; = Yy G1 F; + e;, where the components of F, are assumed to follow a 
pure diagonal GARCH model of dimension k. We can use this model to forecast 
the behaviour of the risk-factor changes X; by first forecasting the behaviour of the 
factors F;; the error term is usually ignored in practice. 


14.2.6 MGARCH and Conditional Risk Measurement 


Suppose we calibrate an MGARCH model (possibly with VARMA conditional 
mean structure) having the general structure X; = fb; + 5 z Z; to historical risk- 
factor return data X;—n+1, ..., X;. We are interested in the loss distribution of 
List = li (X:+1) conditional on F; = o ({Xs: s < t}), as described in Sections 9.1 
and 9.2.1. (We may also be interested in longer-period losses as in Section 9.2.7.) 

A general method that could be applied is the Monte Carlo method of Sec- 
tion 9.2.5: we could simulate many times the next value X;+1 (and subsequent 
values for losses over longer periods) of the stochastic process (X;);¢z, using esti- 
mates of +1 and X44. 

Alternatively, a variance—covariance calculation could be made as in Section 9.2.2. 
Considering a linearized loss operator with the general form / i (x) = —(c, + bx), 
the moments of the conditional loss distribution would be 


ECLA | Fi) =c: — Dimes, — cov(LAy | Fi) = Ley dy. 


Under an assumption of multivariate Gaussian innovations, the conditional distribu- 
tion of in , given F, would be univariate Gaussian as in equation (2.14). Under an 
assumption of (scaled) multivariate ¢ innovations, it would be univariate t. To calcu- 
late VaR and ES we can follow the calculations in Examples 2.11, 2.14 and 2.15 but 
we would first need to compute estimates of 2.1 and ur+1 using our multivariate 


time-series model, as indicated in Example 14.19. 


Example 14.19. Consider the simple stock portfolio in Example 2.1, where c; = 0 
and b; = V;w;, and suppose our time-series model is a first-order DVEC model 
with a constant mean term. The model takes the form 


Xp wa 5; Z, E, = Ao + Aro (X-1 — W(X; — W’) + Bo Ir. 

(14.25) 
Suppose we assume that the innovations are multivariate Student t. The standard 
risk measures applied to the linearized loss distribution would take the form 


11 W;(v — 


a t; (a), 
v 


a saapi a m aaa 
O $ ’ 


w, X; 
vari, = -vuja + Ya a 


v l-a v-1l 
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where the notation is as in Example 2.15. Estimates of the risk measures are obtained 
by replacing mw, v and X41 by estimates. The latter can be calculated iteratively 
from (14.25) using estimates of Ag, A; and B and a starting value for Xo. 


As an alternative to formal estimation of MGARCH models we can also use 
the technique of multivariate exponentially weighted moving average (EWMA) 
forecasting to compute an estimate of X,}1. The recursive procedure has been 
described in Section 9.2.2. 


Notes and Comments 


The CCC—GARCH model was suggested by Bollerslev (1990), who used it to model 
European exchange-rate data before and after the introduction of the European 
Monetary System (EMS) and came to the expected conclusion that conditional 
correlations after the introduction of the EMS were higher. The idea of the DCC 
model is explored by Engle (2002), Engle and Sheppard (2001) and Tse and Tsui 
(2002). Fitting in stages is promoted in the formulation of Engle and Sheppard 
(2001), and asymptotic statistical theory for this procedure is given. Hafner and 
Franses (2009) suggest that the dynamics of CCC are too simple for collections of 
many asset returns and give a generalization. See also the textbook by Engle (2009). 

The DVEC model was proposed by Bollerslev, Engle and Wooldridge (1988). The 
more general (but overparametrized) VEC model is discussed in Engle and Kroner 
(1995) alongside the BEKK model, named after these two authors as well as Baba 
and Kraft, who co-authored an earlier unpublished manuscript. The condition for the 
positive definiteness of X; in (14.15), which suggests the parametrizations (14.17)- 
(14.19), is described in Attanasio (1991). 

There is limited work on statistical properties of quasi-maximum likelihood esti- 
mates in multivariate models: Jeantheau (1998) shows consistency for a general 
formulation and Comte and Lieberman (2003) show asymptotic normality for the 
BEKK formulation. 

The PC-GARCH model was first described by Ding (1994) in a PhD thesis; under 
the name of orthogonal GARCH it has been extensively investigated by Alexander 
(2001). The latter shows how PC-GARCH can be used as a dimension-reduction 
tool for expressing the conditional covariances of a number of asset return series in 
terms of a much smaller number of principal component return series. 

Survey articles by Bollerslev, Engle and Nelson (1994) and Bauwens, Laurent 
and Rombouts (2006) are useful sources of additional information and references 
for all of these multivariate models. 
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Advanced Topics in Multivariate Modelling 


This chapter contains selected advanced topics that extend the presentation of multi- 
variate models and copulas in Chapters 6 and 7. In Section 15.1 we treat topics that 
relate to multivariate normal mixture distributions and elliptical distributions. In 
Section 15.2 we extend the theory of Archimedean copulas and consider models 
that are more general or more flexible than those presented in Section 7.4. 


15.1 Normal Mixture and Elliptical Distributions 


This section addresses two statistical issues that were raised in Chapter 6. In Sec- 
tion 15.1.1 we give the algorithm for fitting multivariate generalized hyperbolic 
distributions to data, which was used to create the examples in Section 6.2.4. In 
Section 15.1.2 we discuss empirical tests for elliptical symmetry. 


15.1.1 Estimation of Generalized Hyperbolic Distributions 


While univariate GH models have been fitted to return data in many empirical stud- 
ies, there has been relatively little applied work with the multivariate distributions. 
However, normal mixture distributions of the kind we have described may be fit- 
ted with algorithms of the EM (expectation—maximization) type. In this section we 
present an algorithm for that purpose and sketch the ideas behind it. Similar methods 
have been developed independently by other authors and references may be found 
in Notes and Comments. 

Assume we have iid data X,,..., X, and wish to fit the multivariate GH 
distribution, or one of its special cases. Summarizing the parameters by 0 = 
(A, x, Y, H, X, y)’, the problem is to maximize 


n 
In L(0; X1,...,Xn) = Do In fx (Xi; 9), (15.1) 
i=l 
where fy (x; 0) denotes the GH density in (6.29). 

This problem is not particularly easy at first sight due to the number of parameters 
and the necessity of maximizing over covariance matrices X. However, if we were 
able to “observe” the latent mixing variables W1, ..., W, coming from the mixture 
representation in (6.24), it would be much easier. Since the joint density of any pair 
X; and W; is given by 


fx w&x, w; 0) = fxiw@ | w; u, X, y)hw(w; à, x, Y), (15.2) 


560 15. Advanced Topics in Multivariate Modelling 


we could construct the likelihood 


In L(0; X1,..., Xn, Wi,---, Wn) 


n n 
= Soin fyw Xi | Wise. D, y) + Do Inhw(Wi Axy), (15.3) 
i=l i=l 
where the two terms could be maximized separately with respect to the parameters 
they involve. The apparently more problematic parameters of X and y are in the 
first term of the likelihood, and estimates are relatively easy to derive due to the 
Gaussian form of this term. 

To overcome the latency of the W; data the EM algorithm is used. This is an 
iterative procedure consisting of an E-step, or expectation step (where essentially 
W; is replaced with an estimate given the observed data and current parameter 
estimates), and an M-step, or maximization step (where the parameter estimates 
are updated). Suppose at the beginning of step k we have the vector of parameter 
estimates 0l% We proceed as follows. 


E-step. We calculate the conditional expectation of the so-called augmented like- 
lihood (15.3) given the data X;,..., X, using the parameter values 6], This 
results in the objective function 


00; 0!) = En L0; X1,..., Xn, Wi,..-, Wn) | Xi; <, Xn; OM). 


M-step. We maximize the objective function with respect to 0 to obtain the next 
set of estimates 0+1, 


Alternating between these steps, the EM algorithm produces improved parameter 
estimates at each step (in the sense that the value of the original likelihood (15.1) is 
continually increased) and they converge to the maximum likelihood (ML) estimates. 

In practice, performing the E-step amounts to replacing any functions g(W;) of the 
latent mixing variables that arise in (15.3) by the quantities E(g(W;) | Xj; 6). To 
calculate these quantities we can observe that the conditional density of W; given X; 
satisfies fwix(w |x; 0) « fw,x(w, x; 0), up to some constant of proportionality. 
It may therefore be deduced from (15.2) that 


Wi | Xi ~ NT(A — 4d, (Xi — M 2 '(Xi -w)t+x,.vt+y'Z 'y). (15.4) 


If we write out the likelihood (15.3) using (6.25) for the first term and the 
generalized inverse Gaussian (GIG) density (A.14) for the second term, we find 
that the functions g(W,) arising in (15.3) are gj(w) = w, go(w) = 1/w and 
g3(w) = In(w). The conditional expectation of these functions in model (15.4) may 
be evaluated using information about the GIG distribution in Section A.2.5; note 
that E(In(W;) | Xj; 6'*1) involves derivatives of a Bessel function with respect to 
order and must be approximated numerically. We will introduce the notation 


a = E(W7' | Xn om, nhl = ecw, | Xoh, F = E0n(W;) | X; 00), 
(15.5) 
which allows us to describe the basic EM scheme as well as a variant below. 
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In the M-step there are two terms to maximize, coming from the two terms 
in (15.3); we write these as Q1 (u, X, y; 6!) and Q2(A, x, y; 01%). To address the 
identifiability issue mentioned in Section 6.2.3 we constrain the determinant of X to 
be some fixed value (in practice, we take the determinant of the sample covariance 
matrix S) in the maximization of Qı. The maximizing values of w, X and y may 
then be derived analytically by calculating partial derivatives and setting these equal 
to zero; the resulting formulas are embedded in Algorithm 15.1 below (see steps (3) 
and (4)). The maximization of Q2(A, x, W; oll) with respect to the parameters of 
the mixing distribution is performed numerically; the function Q2(A, x, Y; gl) is 


Gad) TE oxy 8 = a0 om 
i=l i=l i=1 
— inà ln(x) + dnaln() — n In(2K;(yxY)). (15.6) 


This would complete one iteration of a standard EM algorithm. However, there are 
a couple of variants on the basic scheme; both involve modification of the final step 
described above, namely the maximization of Q2. 
Assuming the parameters u, X and y have been updated first in iteration k, we 
define 
gk 2] — aL, x, yt, ult, gN, yl ly 


recalculate the weights ue i and é va in (15.5), and then maximize the 


function Q2 (A, n, £; 01%?) in (15.6). This results in a so-called MCECM algorithm 
(multi-cycle, expectation, conditional maximization), which is the one we present 
below. 

Alternatively, instead of maximizing Q2 we may maximize the original likeli- 
hood (15.1) with respect to A, x and y with the other parameters held fixed at the 
values wil, 5!) and y!1; this results in an ECME algorithm. 


Algorithm 15.1 (EM estimation of GH distribution). 


(1) Set iteration count k = 1 and select starting values for 6!"!. In particular, 
reasonable starting values for u, y and X, respectively, are the sample mean, 
the zero vector and the sample covariance matrix S. 


(2) Calculate weights ôl! and n!“! using (15.5), (15.4) and (A.15). Average the 
weights to get 


n n 
Baat Tat and gl =n) fl 
i=l i=l 


(3) For a symmetric model set y+! = 0. Otherwise set 


= kip 
k+] ” ! Xi- a! Iž — Xj) 
qe Ne SpA — | 
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(4) Update estimates of the location vector and dispersion matrix by 


ps k 
P ee 5! ly, a y+ 
S[k] : 


Lip : 
v=- ya lox; — ply (X; — ele ly — giy yA, 
i=1 


[k+1] _ 


H 


1/d 
n _ (SLY 


X Ta 


(5) Set 
EA = (QED, Ll yl pH, pik piky, 


Calculate weights aura). ee and £ Ka using (15.5), (15.4) and information 
in Section A.2.5. 


(6) Maximize Q2(å, x, Y; glk.21) in (15.6) with respect to A, x and y to complete 
the calculation of 6!-7!, Increment iteration count k —> k+1 and goto step (2). 


This algorithm may be easily adapted to fit special cases of the GH distribution. This 
involves holding certain parameters fixed throughout and maximizing with respect 
to the remaining parameters: for the hyperbolic distribution we set A = 1; for the 
NIG distribution à = -4; for the ¢ distribution y = 0; for the VG distribution 
x = 0. In the case of the ¢ and VG distributions, in step (6) we have to work with 
the function Q2 that results from assuming an inverse gamma or gamma density 
for hy. 


15.1.2 Testing for Elliptical Symmetry 


The general problem of this section is to test whether a sample of identically dis- 
tributed data vectors X;,..., X, has an elliptical distribution Eg (mu, X, Y) for some 
h, X and generator y. In all of the methods we require estimates of u and X, and 
these can be obtained using approaches discussed in Section 6.3.4, such as fitting 
t distributions, calculating M-estimates or perhaps using (6.49) in the bivariate case. 
We denote the estimates simply by ft and È. 

In finance we cannot generally assume that the observations are of iid random 
vectors, but we assume that they at least have an identical distribution. Note that, 
even if the data were independent, the fact that we generally estimate u and X from 
the whole data set would introduce dependence in the procedures that we describe 
below. 


Stable correlation estimates: an exploratory method. An easy exploratory graph- 
ical method can be based on Proposition 6.28. We could attempt to estimate 


O(X|A(X)>c), hæ) = a- AÈ â) 


for various values of c > 0. We expect that for elliptically distributed data the 
estimates will remain roughly stable over a range of different c values. Of course 
the estimates of this correlation should again be calculated using some method that 
is more efficient than the standard correlation estimator for heavy-tailed data. The 
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Figure 15.1. Correlations are estimated using the Kendall’s tau method for points lying 
outside ellipses of progressively larger size (as shown in (a) and (c)). (a), (b) Two thousand 
t-distributed data with four degrees of freedom and p = 0.5. (c), (d) Two thousand daily 
log-returns on Microsoft and Intel. Points show estimates for an ellipse that is allowed to 
grow until there are only forty points outside; dashed lines show estimates of correlation for 
all data. 

method is most natural as a bivariate method, and in this case the correlation of 
X | h(X) = c can be estimated by applying the Kendall’s tau transform method to 
those data points X; that lie outside the ellipse defined by A(x) = c. In Figure 15.1 
we give an example with both simulated and real data, neither of which show any 
marked departure from the assumption of stable correlations. The method is of 
course exploratory and does not allow us to come to any formal conclusion. 


Q-Q plots. The remaining methods that we describe rely on the link (6.41) between 
non-singular elliptical and spherical distributions. If u and X were known, then we 
would test for elliptical symmetry by testing the data {X7 1? (X;—u): i =1,...,n} 
for spherical symmetry. Replacing these parameters by estimates as above, we con- 
sider whether the data 


(Y; = $7 (X — pf): i =1,...,n} (15.7) 


are consistent with a spherical distribution, while ignoring the effect of estimation 
error. 

Some graphical methods based on Q-Q plots have been suggested by Li, Fang 
and Zhu (1997) and these are particularly useful for large d. These essentially rely 
on the following result. 


Lemma 15.2. Suppose that T (Y) is a statistic such that, almost surely, 
T(aY)=T(Y) _ foreverya > 0. (15.8) 
Then T (Y) has the same distribution for every spherical vector Y ~ S (Y). 
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Proof. From Theorem 6.21 we have T (Y) £ T (RS), and T (RS) = T(S) follows 
from (15.8). Since the distribution of T (Y) only depends on S and not R, it must 
be the same for all Y ~ S (Y). 


We exploit this result by looking for statistics T (Y) with the property (15.8), 
whose distribution we know when Y ~ N4(0, Iz). Two examples are 


gy 


TY) = = 
Jua- DEL - 2? z 


(15.9) 


For Y ~ Na(0, I4), and hence for Y ~ Si (y), we have 7\(Y) ~ tg—, and 
T(Y) ~ Beta(åk, 4 (d — k)). 

Our experience suggests that the beta plot is the more revealing of the resulting 
Q-Q plots. Li, Fang and Zhu (1997) suggest choosing k such that it is roughly equal 
to d — k. In Figure 15.2 we show examples of the Q-Q plots obtained for 2000 
simulated data from a ten-dimensional ¢ distribution with four degrees of freedom 
and for the daily, weekly and monthly return data on ten Dow Jones 30 stocks that 
were analysed in Example 6.3 and Section 6.2.4. The curvature in the plots for daily 
and weekly returns seems to be evidence against the elliptical hypothesis. 


Numerical tests. We restrict ourselves to simple ideas for bivariate tests; references 
to more general test ideas are found in Notes and Comments. If we neglect the 
error involved in estimating location and dispersion, testing for elliptical symmetry 
amounts to testing the Y; data in (15.7) for spherical symmetry. For i = 1,...,n, 
if we set R; = ||Y;|| and $; = Y;/||Y; ||, then under the null hypothesis the S; data 
should be uniformly distributed on the unit sphere 3 d—1 and the paired data (R;, S;) 
should form realizations of independent pairs. 

In the bivariate case, testing for uniformity on the unit circle 4! amounts to a 
univariate test of uniformity on [0, 27] for the angles ©; described by the points 
S; = (cos @;, sin @;)’ on the perimeter of the circle; equivalently, we may test the 
data {U; := O©;/(27m): i = 1,...,n} for uniformity on [0, 1]. Neglecting issues 
of serial dependence in the data, this may be done, for instance, by a standard chi- 
squared goodness-of-fit test (see Rice 1995, p. 241) or a Kolmogorov—Smirnov test 
(see Conover 1999). Again neglecting issues of serial dependence, the independence 
of the components of the pairs {(R;, Ui): i = 1,...,n} could be examined by 
performing a test of association with Spearman’s rank correlation coefficient (see, 
for example, Conover 1999, pp. 312-328). 

We have performed these tests for the two data sets used in Figure 15.1, these 
being 2000 simulated bivariate ¢ data with four degrees of freedom and 2000 daily 
log-returns for Intel and Microsoft. In Figure 15.3 the transformed data on the unit 
circle S; and the implied angles U; on the [0, 1] scale are shown; the dispersion 
matrices have been estimated using the construction (6.49) based on Kendall’s tau. 
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Figure 15.2. Q-Q plots of the beta statistic (15.9) for four data sets with dimension d = 10; 
we have set k = 5. (a) Two thousand simulated observations from a f distribution with four 
degrees of freedom. (b) Daily, (c) weekly and (d) monthly returns on Dow Jones stocks as 
analysed in Example 6.3 and Section 6.2.4. Daily and weekly returns show evidence against 
elliptical symmetry. 


Neither of these data sets shows significant evidence against the elliptical hypothesis. 
For the bivariate t data the p-values for the chi-squared and Kolmogorov—Smirnov 
tests of uniformity and the Spearman’s rank test of association are, respectively, 
0.99, 0.90 and 0.10. For the stock-return data they are 0.08, 0.12 and 0.19. Note that 
simulated data from lightly skewed members of the GH family often fail these tests. 


Notes and Comments 


EM algorithms for the multivariate GH distribution have been independently pro- 
posed by Protassov (2004) and Neil Shephard (personal communication). Our 
approach is based on EM-type algorithms for fitting the multivariate t distribu- 
tion with unknown degrees of freedom. A good starter reference on this subject is 
Liu and Rubin (1995), where the use of the MCECM algorithm of Meng and Rubin 
(1993) and the ECME algorithm proposed in Liu and Rubin (1994) is discussed. 
Further refinements of these algorithms are discussed in Liu (1997) and Meng and 
van Dyk (1997). 

The Q-Q plots for testing spherical symmetry were suggested by Li, Fang and 
Zhu (1997). There is a large literature on tests of spherical symmetry, including 
Smith (1977), Kariya and Eaton (1977), Beran (1979) and Baringhaus (1991). This 
work is also related to tests of uniformity for directional data: see Mardia (1972), 
Giné (1975) and Prentice (1978). 
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Figure 15.3. Illustration of the transformation of bivariate data to points on the unit circle 
5} using the transformation $; = Y;/||Y;||, where the Y; data are defined in (15.7); the 
angles of these points are then transformed on to the [0, 1] scale, where they can be tested for 
uniformity. (a) Two thousand simulated rt data with four degrees of freedom. (b) Two thousand 
Intel and Microsoft log-returns. Neither show strong evidence against elliptical symmetry. 


15.2 Advanced Archimedean Copula Models 


In Section 7.4.2 we gave a characterization result for Archimedean copula gener- 
ators that can be used to generate copulas in any dimension d (Theorem 7.50). In 
Section 15.2.1 we provide a more general result that characterizes the larger class 
of generators that may be used in a given dimension d. 

The Archimedean copulas discussed so far have all been models for exchange- 
able random vectors. We discuss non-exchangeable, asymmetric extensions of the 
Archimedean family in Section 15.2.2. 


15.2.1 Characterization of Archimedean Copulas 
Recall that multivariate Archimedean copulas take the form 
C(u1,..., ua) = Ww) +--+ a), 


where w is an Archimedean generator function, i.e. a decreasing, continuous, convex 
function yw: [0,0co) — [0, 1] satisfying w(O) = 1 and lim;.. w(t) = 0. The 
necessary and sufficient condition for y to generate a copula in every dimension 
d > 2 is that it should be a completely monotonic function satisfying 


k 
kw >0, kEN, te (0,0). 
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We also recall that these generators may be characterized as the Laplace—Stieltjes 
transforms of dfs G on [0, oo) such that G(O) = 0. 

While Archimedean copulas with completely monotonic generators can be used 
in any dimension, if we are interested in Archimedean copulas in a given dimension 
d, we can relax the requirement of complete monotonicity and substitute the weaker 
requirement of d-monotonicity. A generator y is d-monotonic if it is differentiable 
up to order (d — 2) on (0, 00) with derivatives satisfying 


dk 
(DF Gv) SO. k=0,1,...,d—2, 


and if (— 1)4-2y 4-2) is decreasing and convex. 


Theorem 15.3. If Y: [0, 00) — [0, 1] is an Archimedean copula generator, then 
the construction (7.46) gives a copula in dimension d if and only ify is d-monotonic. 


Proof: See McNeil and NeSlehova (2009). 


If w is a d-monotonic generator, we write yy € Wq. Examples of generators in 
Wq that are not in Wy4) (or Wo) are W(t) = max((1 — t)¢—!, 0) and the Clayton 
generator y(t) = max((1 + @t)~!/°, 0) for —1/(d — 1) < 6 < —1/d (see McNeil 
and NeSlehova (2009) for details). An elegant way of characterizing the d-monotonic 
copula generators is in terms of a lesser-known integral transform. 

Let G be a df on [0, œo) satisfying G(O) = 0 and let X be an rv with df G. Then, 
fort > Oandd > 2, the Williamson d-transform of G (or X) is the function 


00 +\4-1 
WaG (t) = max (1 — z) .0) dG(x) 
0 Xx 
t d-1 
= B( max (1 — ) .0)). 
X 


Every Williamson d-transform of a df G satisfying G (0) = 0 is a d-monotonic 
copula generator, and every d-monotonic copula generator is the Williamson d- 
transform of a unique df G. In the same way that a rich variety of copulas for 
arbitrary dimensions can be created by taking Laplace-Stieltjes transforms of par- 
ticular distributions, a rich variety of copulas for dimension d can be created by 
taking Williamson d-transforms. For example, in McNeil and Nešlehová (2010) it 
is shown that by taking X to have a gamma, inverse-gamma or Pareto distribution we 
can obtain interesting families of copulas that give rise to a wide range of Spearman’s 
or Kendall’s rank correlation values. 

There is an elegant connection between d-monotonic copulas generators and 
simplex distributions, which can form the basis of a sampling algorithm for the 
corresponding copulas. A d-dimensional random vector X is said to have a simplex 
distribution if X £ R Sa, where Sq is an rv that is distributed uniformly on the unit 
simplex 4q = {x € RZ: xı +---+xq = 1} and R is an independent non-negative 
scalar rv with df Fr known as the radial distribution of the simplex distribution. A key 
theorem shows that the class of d-dimensional Archimedean copulas is equivalent 
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to the class of survival copulas of d-dimensional simplex distributions (excluding 
those with point mass at the origin). 


Theorem 15.4. If X has a simplex distribution with radial distribution Fp satisfying 
FRr(O) = O, then the survival copula of X is Archimedean with generator Yy = 
WFR. 


Proof. Let x € RI and write Sg = (S1, ..., Sz)’. We use the fact that the survival 
function of Sy is given by 


d-1 
P(S1 > x1, ---, Sad > x4) = max ((1- xs) .0) 


i=l 
(Fang, Kotz and Ng 1990, p. 115) to establish that 


x1 Xd 
P(X, > x1,..., Xa > x4) = E| P| $i > —,..., S4 > = 
R R 

d 


-s(n (12t) 


= y(x +: + xa). 


The marginal survival functions P(X; > x) = y(x) are continuous and strictly 
decreasing on {x: w(x) > 0} and it may be verified that, for any x € RZ, we have 
Wr te H xa) = Wh) +++ + WH @a))), so we can write 


P(X, >x1,...,Xa > xa) = Ww (Wr) + + a))), 


which proves the assertion; for more technical details see McNeil and NeSlehova 
(2009). 


15.2.2 Non-exchangeable Archimedean Copulas 


A copula obtained from construction (7.46) is obviously an exchangeable copula 
conforming to (7.20). While exchangeable bivariate Archimedean copulas are widely 
used in modelling applications, their exchangeable multivariate extensions represent 
avery specialized form of dependence structure and have more limited applications. 
An exception to this is in the area of credit risk, although even here more general mod- 
els with group structures are also needed. It is certainly natural to enquire whether 
there are extensions to the Archimedean class that are not rigidly exchangeable, and 
we devote this section to a short discussion of some possible extensions. 


Asymmetric Archimedean copulas. Let Cg be any exchangeable d-dimensional 
copula. A parametric family of asymmetric copulas C q,,....7, 18 then obtained by 
setting 


yig 


d 

l—g; d 

Chior, nage -+s Ua) = Cay aceite Ui fe; Se (u1, ..., Ud) eR ’ 
i=] 


(15.10) 
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Figure 15.4. Pairwise scatterplots of 10 000 simulated points from an asymmetric Gumbel 
copula CTi as 9.7 based on construction (15.10). This is simulated using Algorithm 15.5. 


where O < aj < 1 for all i. Only in the special case aj] = --- = ag is the 
copula (15.10) exchangeable. Note also that when the œ; parameters are 0, Cọ,0,...,0 
is the independence copula, and when the œ; parameters are 1, Co,1,...,1 is simply Cog. 
When Cg is an Archimedean copula, we refer to copulas constructed by (15.10) as 
asymmetric Archimedean copulas. 

We check that (15.10) defines a copula by constructing a random vector with this 
df and observing that its margins are standard uniform. The construction is presented 
as a simulation algorithm. 


Algorithm 15.5 (asymmetric Archimedean copula). 


(1) Generate a random vector (V1, ..., Vz) with df Cg. 


(2) Generate, independently of (V1, ..., Vz), independent standard uniform vari- 
ates U; fori = 1,...,d. 


(3) Return U; = max(V,/%, ü=) fori=1,...,d. 


It may be easily verified that (U;,..., Ug) have the df (15.10). See Figure 15.4 
for an example of simulated data from an asymmetric bivariate copula based on 
Gumbel’s copula. Note that an alternative copula may be constructed by taking 
(U EEES Ua) in Algorithm 15.5 to be distributed according to some copula other 
than the independence copula. 


Nested Archimedean copulas. Non-exchangeable, higher-dimensional Archime- 
dean copulas with exchangeable bivariate margins can be constructed by recursive 
application (or nesting) of Archimedean generators and their inverses, and we will 
give examples in this section. The biggest problem with these constructions lies 
in checking that they lead to valid multivariate distributions satisfying (7.1). The 
necessary theory is complicated and we will simply indicate the nature of the condi- 
tions that are necessary without providing justification; a comprehensive reference 
is Joe (1997). It turns out that with some care we can construct situations of partial 
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exchangeability. We give three- and four-dimensional examples that indicate the 
pattern of construction. 


Example 15.6 (three-dimensional non-exchangeable Archimedean copulas). 
Suppose that ¢; and ¢2 are two Archimedean copula generators and consider 


C(u, u2, u3) = Yaz o Wy U) + Wy (ua) + Wy (us). (15.11) 


Conditions that ensure that this is a copula are that the generators yı and 
W2 are completely monotonic functions, as in (7.47), and that the composition 
Wo lo wW1: [0, co) — [0, œo) is a function whose derivative is a completely mono- 
tonic function. 

Observe that when y2 = yı = W we are back in the situation of full 
exchangeability, as in (7.46). Otherwise, if Yı ~ W2 and (U1, U2, U3) is a ran- 
dom vector with df given by (15.11), then only U; and U2 are exchangeable, 
i.e. (U1, U2, U3) 2 (U2, U1, U3), but no other swapping of subscripts is possible. 
All bivariate margins of (15.11) are themselves Archimedean copulas. The margins 
C13 and C23 have generator y2 and C12 has generator y1. 


Example 15.7 (four-dimensional non-exchangeable Archimedean copulas). A 
possible four-dimensional construction is 


C(u1, u2, u3, u4) = Y3 (y3 o yaly (u) + Wy u2)) 
+ 03! o yaly (u3) + y3 (u4))), (15.12) 


where y1, Y2 and y3 are three distinct, completely monotonic Archimedean copula 
generators and we assume that the composite functions y3 Lo wi and y3 tS Ww have 
derivatives that are completely monotonic to obtain a proper distribution function. 
This is not the only possible four-dimensional construction (Joe 1997), but it is a 
useful construction because it gives two exchangeable groups. If (U1, U2, U3, U4) 
has the df (15.12), then U; and U2 are exchangeable, as are U3 and U4. 


The same kinds of construction can be extended to higher dimensions, subject 
again to complete monotonicity conditions on the compositions of generator inverses 
and generators (see Notes and Comments). 


LT-Archimedean copulas with p-factor structure. Recall from Definition 7.52 the 
family of LT-Archimedean copulas. The arguments of Proposition 7.51 establish 
that these have the form 


C(m,...,ua) = (ĜT (u1) +--+ + G"(ua)) 


d 
= (exp (-v 2é"wa)) (15.13) 
i=l 


for a strictly positive rv V with Laplace-Stieltjes transform G. 
It is possible to generalize the construction (15.13) to obtain a larger family of 
non-exchangeable copulas. An LT-Archimedean copula with p-factor structure is 
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constructed from a p-dimensional random vector V = (Vj, ..., Vp) with indepen- 
dent, strictly positive components and a matrix A € R?*? with elements aij > Qas 


follows: 4 
C(ui,..., Ud) = (exp (-Z avô) ). (15.14) 
i=l 


where a; is the ith row of A and G7! is the Laplace-Stieltjes transform of the strictly 
positive rv a; V. 

We can write (15.14) in a different way, which facilitates the computation of 
C(uy,...,Uq). Note that 


d 


d P 
Y avô u) =Y Vj Y aij u). 
i=l 


j=1 isl 


It follows from the independence of the V; that 
p d 
Cüt sesuta) = I] E (exp (-v Z aôr'))) 
i=1 


p d 
= ôv, (Z ayôr u); (15.15) 


Note that (15.15) is fairly easy to evaluate when Gy, , the Laplace-Stieltjes transform 
of the Vj, is available in closed form, because G;(t) = Mi- Gy, (aijt) by the 
independence of the V;. 


Notes and Comments 


The relationship between d-monotonic functions and Archimedean copulas in 
dimension d, as well as the link to simplex distributions, is developed in McNeil and 
NeSlehova (2009); see also McNeil and NeSlehova (2010), which provides many 
examples of such copulas and generalizes the theory to so-called Liouville copulas, 
the survival copulas of Liouville distributions. 

For more details on the asymmetric copulas obtained from construction (15.10) 
and ideas for more general asymmetric copulas see Genest, Ghoudi and Rivest 
(1998). These copula classes were introduced in the PhD thesis of Khoudraji (1995). 
For additional theory concerning nested higher-dimensional Archimedean copulas 
and sampling algorithms see Joe (1997), McNeil (2008), Hofert (2008, 2011, 2012) 
and Hofert and Machler (2011). LT-Archimedean copulas with p-factor structure 
were proposed by Rogge and Schonbucher (2003) with applications in credit risk in 
mind. Krupskii and Joe (2013) extend the idea to suggest even more general ways 
of constructing factor copula models. 
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Advanced Topics in Extreme Value Theory 


This chapter extends the treatment of EVT in Chapter 5. In Section 16.1 we pro- 
vide more information about the tails of some of the distributions and models that 
are prominent in this book, including the tails of normal variance mixture models 
(Chapter 6) and strictly stationary GARCH models (Chapter 4). 

In Section 16.2 we build on the point process framework for describing the occur- 
rence and magnitude of threshold exceedances (the POT model) that was described 
in Section 5.3. We show how self-exciting processes for extremes may be developed 
based on the idea of Hawkes processes. 

Sections 16.3 and 16.4 provide a concise summary of the more important ideas in 
multivariate EVT; they deal, respectively, with multivariate maxima and multivariate 
threshold exceedances. The novelty of these sections is that the ideas are presented 
as far as possible using the copula methodology of Chapter 7. The style is similar 
to Sections 5.1 and 5.2, with the main results being mostly stated without proof and 
an emphasis being given to examples relevant for applications. 


16.1 Tails of Specific Models 


In this section we survey the tails of some of the more important distributions and 
models that we have encountered in this book. 


16.1.1 Domain of Attraction of the Fréchet Distribution 


As stated in Section 5.1.2, the domain of attraction of the Fréchet distribution consists 
of distributions with regularly varying tails of the form F (x) = x~“L(x) fora > 0, 
where œ is known as the tail index. These are heavy-tailed models where higher- 
order moments cease to exist. Normalized maxima of random samples from such 
distributions converge to a Fréchet distribution with shape parameter € = 1/a, 
and excesses over sufficiently high thresholds converge to a generalized Pareto 
distribution with shape parameter € = 1/a. 

We now show that the Student ¢ distribution and the inverse gamma distribution 
are in this class; we analyse the former because of its general importance in financial 
modelling and the latter because it appears as the mixing distribution that yields the 
Student ¢ in the class of normal variance mixture models (see Example 6.7). In 
Section 16.1.3 we will see that the mixing distribution in a normal variance mixture 
model essentially determines the tail of that model. 


16.1. Tails of Specific Models 573 


Both the ¢ and inverse gamma distributions are presented in terms of their density, 
and the analysis of their tails proves to be a simple application of a useful result 
known as Karamata’s Theorem, which is given in Section A.1.4. 


Example 16.1 (Student ¢ distribution). It is easily verified that the standard uni- 
variate ¢ distribution with v > 1 has a density of the form f,(x) = xTOHD I(x) 
where L is a slowly varying function. Karamata’s Theorem (see Theorem A.7) there- 
fore allows us to calculate the form of the tail F,(x) = JE fo) dy by essentially 
treating the slowly varying function as a constant and taking it out of the integral. 
We get 
[0.0] 
F, (x) = / yD LG) dy ~ vls” L(x), x—> 00, 
X 

from which we conclude that the df F, of a ¢ distribution has tail index v and 
F, € MDA (H1/v) by Theorem 5.8. 


Example 16.2 (inverse gamma distribution). The density of the inverse gamma 
distribution is given in (A.17). It is of the form fy g(x) = x~@FY L(x), since 
ef/* —> 1 as x — oo. Using the same technique as in Example 16.1, we deduce 
that this distribution has tail index a, so Fag € MDA(H1/a). 


16.1.2 Domain of Attraction of the Gumbel Distribution 


The Gumbel class consists of the so-called von Mises distribution functions and any 
other distributions that are tail equivalent to von Mises distributions (see Embrechts, 
Kluppelberg and Mikosch 1997, pp. 138-150). We give the definitions of both of 
these concepts below. Note that distributions in this class can have both infinite 
and finite right endpoints; again we write xp = sup{x € R: F(x) < 1} < œ for 
the right endpoint of F. 


Definition 16.3 (von Mises distribution function). Suppose there exists some 
z < xp such that F has the representation 


£ a | 
Fy = cep- f i Z<X <XF, 
z a(t) 


where c is some positive constant, a(t) is a positive and absolutely continuous 
function with derivative a’, and lim,_,, F a' (x) = 0. Then F is called a von Mises 
distribution function. 


Definition 16.4 (tail equivalence). Two dfs F and G are called tail equivalent if 
they have the same right endpoints (i.e. xp = xg) and liMmy— xp F (x)/G(x) = c for 
some constant 0 < c < ©. 


To decide whether a particular df F is a von Mises df, the following condition is 
extremely useful. Assume there exists some z < xp such that F is twice differen- 
tiable on (z, xf) with density f = F’ and F” < Oin (z, xr). Then F is a von Mises 
df if and only if 7 

n 
lim U URT, (16.1) 
Xx—>XF J 2 (x) 
We now use this condition to show that the gamma df is a von Mises df. 
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Example 16.5 (gamma distribution). The density f = fo, of the gamma dis- 
tribution is given in (A.13), and a straightforward calculation yields F”(x) = 
fœ) = -fB + Ud — a@)/x) < 0, provided x > max((a — 1)/B,0). 
Clearly, limyoo F”(x)/f (x) = —B. Moreover, using L’H6pital’s rule we get 
limy- 00 F(x)/f (x) = limy +o —f(x)/f'(x) = Bre Combining these two limits 
establishes (16.1). 


Example 16.6 (GIG distribution). The density of an rv X ~ N7 (à, x, Y) with 
the GIG distribution is given in (A.14). Let Fy, ,y(«) denote the df and consider 
the case where y > 0. If y = 0, then the GIG is an inverse gamma distribution, 
which was shown in Example 16.2 to be in the Fréchet class. If y > 0, then à > 0, 
and a similar technique to Example 16.5 could be used to establish that the GIG is 
a von Mises df. In the case where à > 0 it is easier to demonstrate tail equivalence 
with a gamma distribution, which is the special case when x = 0. We observe that 


F. 
lim zxy x) Saclay fax) E 


= = = C),x, 
x>% F oy) > fao, y) tov 


for some constant 0 < c,,y,y < œO. It follows that Fy, ,,y € MDA(Ap). 


16.1.3 Mixture Models 


In this book we have considered a number of models for financial risk-factor changes 
that arise as mixtures (or products) of rvs. In Chapter 6 we introduced multivariate 
normal variance mixture models including the Student ¢ and (symmetric) GH distri- 
butions, which have the general structure given in (6.18). A one-dimensional normal 
variance mixture (or the marginal distribution of a d-dimensional normal variance 
mixture) is of the same type (see Section A.1.1) as an rv X satisfying 


X Ê VWZ, (16.2) 


where Z ~ N (0, 1) and W is an independent, positive-valued scalar rv. We would 
like to know more about the tails of distributions satisfying (16.2). 

More generally, to understand the tails of the marginal distributions of elliptical 
distributions it suffices to consider spherical distributions, which have the stochastic 
representation 


XRS (16.3) 


for a random vector S that is uniformly distributed on the unit sphere 4%! = 
{s € R¢: s's = 1}, and for an independent radial variate R (see Section 6.3.1, and 
Theorem 6.21 in particular). Again we would like to know more about the tails of 
the marginal distributions of the vector X in (16.3). 

In Section 4.2 we considered strictly stationary stochastic processes (X;), such 
as GARCH processes satisfying equations of the form 


Xt = O Zt, (16.4) 


where (Z;) are strict white noise innovations, typically with a Gaussian distri- 
bution or (more realistically) a scaled Student ¢ distribution, and where o; is an 
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F;—1-measurable rv representing volatility. These models can also be seen as mix- 
ture models and we would like to know something about the tail of the stationary 
distribution of (X;). 

A useful result for analysing the tails of mixtures is the following theorem due to 
Breiman (1965), which we immediately apply to spherical distributions. 


Theorem 16.7 (tails of mixture distributions). Let X be given by X = YZ for 
independent, non-negative rvs Y and Z such that 


(1) Y has a regularly varying tail with tail index a; 


(2) E(Z°**) < o for some e > 0. 
Then X has a regularly varying tail with tail index a and 
P(X >x)~ E(Z®P(Y >x), x> œ. 


Proposition 16.8 (tails of spherical distributions). Let X {RS ~ Saw) have 
a spherical distribution. If R has a regularly varying tail with tail index a, then so 
does |X;| fori = 1,...,d. If E(R®) < œo for allk > 0, then |X;| does not have a 
regularly varying tail. 


Proof. Suppose that R has a regularly varying tail with tail index œ and consider 
RS;. Since |S;| is a non-negative rv with finite support [0, 1] and finite moments, it 
follows from Theorem 16.7 that R|S;|, and hence |X;|, are regularly varying with 
tail index a. If E(R*) < oo for all k > 0, then E|X;|* < œœ for all k > 0, so that 
|X;| cannot have a regularly varying tail. 


Example 16.9 (tails of normal variance mixtures). Suppose that X 4 J/WZ with 
Z ~ Na(0, Ig) and W an independent scalar rv, so that both Z and X have spherical 
distributions and X has a normal variance mixture distribution. The vector Z has the 
spherical representation Z £ ŘS , where R? ~ x (see Example 6.23). The vector 
X has the spherical representation X {RS , where R £ JWR. 

Now, the chi-squared distribution (being a gamma distribution) is in the domain of 
attraction of the Gumbel distribution, so E(R*) = E((R?)*/2) < oo forall k > 0. 
We first consider the case when W has a regularly varying tail with tail index «œ so that 
Fw(w) = L(w)w~. It follows that P(O/W > x) = P(W > x”) = Lo(x)x~?2, 
where L(x) := L(x?) is also slowly varying, so that /W has a regularly varying 
tail with tail index 2~. By Theorem 16.7, R = JWR also has a regularly varying 
tail with tail index 2a, and by Proposition 16.8, so do the components of |X|. 

To consider a particular case, suppose that W ~ Ig(v, $v), so that, by Exam- 
ple 16.2, W is regularly varying with tail index $v. Then /W has a regularly 
varying tail with tail index v and so does |X;|; this is hardly surprising because 
X ~ tav, 9, I4), implying that X; has a univariate Student ¢ distribution with v 
degrees of freedom, and we already know from Example 16.1 that this has tail 
index v. 

On the other hand, if Fy € MDA(Họo), then E(R*) < œ for all k > 0 and |X;| 
cannot have a regularly varying tail by Proposition 16.8. This means, for example, 
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Table 16.1. Approximate theoretical values of the tail index « solving (16.5) for various 
GARCH<(1, 1) processes with Gaussian and Student ź innovation distributions. 


t distribution 
e—a 


Parameters Gauss v=8 v=4 


aj=0.2, B=0.75 44 35 2.7 
aj=0.1, B=085 91 58 3.4 
a; = 0.05, 8= 0.95 211 7.9 3.9 


that univariate GH distributions do not have power tails (except for the special 
boundary case corresponding to Student t) because the GIG is in the maximum 
domain of attraction of the Gumbel distribution, as was shown in Example 16.6. 


Analysis of the tails of the stationary distribution of GARCH-type models is more 
challenging. In view of Theorem 16.7 and the foregoing examples, it is clear that 
when the innovations (Z;) are Gaussian, then the law of the process (X;) in (16.4) 
will have a regularly varying tail if the volatility o, has a regularly varying tail. 
Mikosch and Starica (2000) analyse the GARCH(1, 1) model (see Definition 4.20), 
where the squared volatility satisfies o? = aot aX? | + Bo? ,. They show 
that under relatively weak conditions on the innovation distribution of (Z;), the 
volatility o; has a regularly varying tail with tail index « given by the solution of 
the equation 

E((a,Z? +p) = 1. (16.5) 


In Table 16.1 we have calculated approximate values of « for various innovation 
distributions and parameter values using numerical integration and root-finding pro- 
cedures. By Theorem 16.7 these are the values of the tail index for the stationary 
distribution of the GARCH(1, 1) model itself. 

Two main findings are obvious: for any fixed set of parameter values, the tail 
index gets smaller and the tail of the GARCH model gets heavier as we move to 
heavier-tailed innovation distributions; for any fixed innovation distribution, the tail 
of the GARCH model gets lighter as we decrease the ARCH effect (a) and increase 
the GARCH effect (6). 


Tail dependence in elliptical distributions. We close this section by giving a result 
that reveals an interesting connection between tail dependence in elliptical distribu- 
tions and regular variation of the radial rv R in the representation X = w+ RAS 
of an elliptically symmetric distribution given in Proposition 6.27. 


Theorem 16.10. Let X £ M+ RAS ~ E(u, X, Y), where u, R, A and S are as 
in Proposition 6.27 and we assume that oji > 0 for alli = 1,...,d. If R has a 
regularly varying tail with tail index a > 0, then the coefficient of upper and lower 
tail dependence between X; and X ; is given by 

fee cos” (t) dt 


x /2—arcsin pij 


MXi, Xj) = (16.6) 


ie cos” (t) dt 
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where pij is the (i, j)th element of P = (X) and p is the correlation operator 
defined in (6.5). 


An example where R has a regularly varying tail occurs in the case of the multi- 
variate ¢ distribution X ~ tą(v, m, X). It is obvious from the arguments used in 
Example 16.9 that the tail of the df of R is regularly varying with tail index a = v. 
Thus (16.6) with œ replaced by v gives an alternative expression to (7.38) for cal- 
culating tail-dependence coefficients for the t copula C : P 

Arguably, the original expression (7.38) is easier to work with, since the df of 
a univariate ¢ distribution is available in statistical software packages. Moreover, 
the equivalence of the two formulas allows us to conclude that we can use (7.38) 
to evaluate tail-dependence coefficients for any bivariate elliptical distribution with 
correlation parameter o when the distribution of the radial rv R has a regularly 
varying tail with tail index v. 


Notes and Comments 


Section 16.1 is a highly selective account tailored to the study of a number of very 
specific models, and all of the theoretical subjects touched upon—tegular variation, 
von Mises distributions, tails of products, tails of stochastic recurrence equations— 
can be studied in much greater detail. 

For more about regular variation, slow variation and Karamata’s Theorem see 
Bingham, Goldie and Teugels (1987) and Seneta (1976). A summary of the more 
important ideas with regard to the study of extremes is found in Resnick (2008). 
Section 16.1.2, with the exception of the examples, is taken from Embrechts, 
Kluppelberg and Mikosch (1997), and detailed references to results on von Mises 
distributions and the maximum domain of attraction of the Gumbel distribution are 
found therein. 

Theorem 16.7 follows from results of Breiman (1965). Related results on distri- 
butions of products are found in Embrechts and Goldie (1980). The discussion of 
tails of GARCH models is based on Mikosch and Starica (2000); the theory involves 
the study of stochastic recurrence relations and is essentially due to Kesten (1973). 
See also Mikosch (2003) for an excellent introduction to these ideas. 

The formula for tail-dependence coefficients in elliptical distributions when the 
radial rv has a regularly varying tail is taken from Hult and Lindskog (2002). Similar 
results were derived independently by Schmidt (2002); see also Frahm, Junker and 
Szimayer (2003) for a discussion of the applicability of such results to financial 
returns. 


16.2 Self-exciting Models for Extremes 


The models described in this section build on the point process formulation of the 
POT model for threshold exceedances in Section 5.3. However, instead of assuming 
Poisson behaviour as we did before, we now attempt to explain the clustering of 
extreme events in financial return data in terms of self-exciting behaviour. 
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16.2.1 Self-exciting Processes 


We first consider modelling the occurrence of threshold exceedances in time using 
a simple form of self-exciting point process known as a Hawkes process. The basic 
idea is that instead of modelling the instantaneous risk of a threshold exceedance as 
being constant over time, we assume that threshold exceedances in the recent past 
cause the instantaneous risk of a threshold exceedance to be higher. The main area 
of application of these models has traditionally been in the modelling of earthquakes 
and their aftershocks, although there is a growing number of applications in financial 
modelling (see Notes and Comments). 

Given data X1,..., Xn and a threshold u, we will assume as usual that there are 
N, exceedances, comprising the data {(i, X;): 1 < i < n, Xj > u}. Note that 
from now on we will express the time of an exceedance on the natural timescale 
of the time series, so if, for example, the data are daily observations, then our 
times are expressed in days. It will also be useful to have the alternative notation 
{(7j, Ki): j =1,..., Nu}, which enumerates exceedances consecutively. 

First we consider a model for exceedance times only. In point process notation we 
let Y; = i; x;>u}, So Y; returns an exceedance time, in the event that one takes place 
at time i, and returns 0 otherwise. The point process of exceedances is the process 
N(-) with state space X = (0, n] given by N (A) = yg lyca for A C X. 

We assume that the point process N (-) is a self-exciting process with conditional 
intensity 

MO=st+hw Yo ht-Tj,ŽŠj-u), (16.7) 
j: O<Tj <t 


where t > 0, Y > O and h is some positive-valued function. Each previous 
exceedance (Tj, X j) contributes to the conditional intensity and the amount that 
it contributes can depend on both the elapsed time (t — T;) since that exceedance 
and the amount of the excess loss (Š; — u) over the threshold. Informally, we 
understand the conditional intensity as expressing the instantaneous chance of a 
new exceedance of the threshold at time ż, like the rate or intensity of an ordinary 
Poisson process. However, in the self-exciting model, the conditional intensity is 
itself a stochastic process that depends on w, the state of nature, through the history 
of threshold exceedances up to (but not including) time t. 

Possible parametric specifications of the h function are 


e h(s, x) =e*-YS 


, where ô, y > 0; or 
e A(s, x) =e*(s +y) "+, where ô, y, p > 0. 


Collecting all parameters in 0, the likelihood takes the form 


n Nu 
L(0; data) = exp (- [ (8) as) [ [4*@. 
0 


i=l 


and may be maximized numerically to obtain parameter estimates. 
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Figure 16.1. (a) S&P daily percentage loss data. (b) Two hundred largest losses. (c) A 
Q-Q plot of inter-exceedance times against an exponential reference distribution. (d) The 
estimated intensity of exceeding the threshold in a self-exciting model. See Example 16.11 
for details. 


Example 16.11 (S&P daily percentage losses 1996-2003). We apply the self- 
exciting process methodology to all daily percentage losses incurred by the Standard 
& Poor’s index in the eight-year period 1996-2003 (2078 values). In Figure 16.1 the 
loss data are shown as well as the point process of the 200 largest daily losses exceed- 
ing a threshold of 1.5%. Clearly, there is clustering in the pattern of exceedance data, 
and the Q-Q plot shows that the inter-exceedance times are not exponential. 

We fit the simpler self-exciting model with h(s,x) = e°*~”°. The param- 
eter estimates (and standard errors) are tT = 0.032(0.011), vr = 0.016(0.0069), 
y = 0.026(0.011), ô = 0.13(0.27), suggesting that all parameters except ô are 
significant. The log-likelihood for the fitted model is —648.2, whereas the log- 
likelihood for a homogeneous Poisson model is —668.2; the Poisson special case 
can therefore clearly be rejected in a likelihood ratio test with a p-value less than 
0.001. Figure 16.1 (d) shows the estimated intensity 4*(t) of crossing the thresh- 
old throughout the data observation period, which seems to reflect the pattern of 
exceedances observed. 


Note that a simple refinement of this model (and those of the following section) 
would be to consider a self-exciting structure where both extreme negative and 
extreme positive returns contributed to the conditional intensity; this would involve 
setting upper and lower thresholds and considering exceedances of both. 
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16.2.2 A Self-exciting POT Model 


We now consider how the POT model of Section 5.3.2 might be generalized to 
incorporate a self-exciting component. We first develop a marked self-exciting model 
where marks have a generalized Pareto distribution but are unpredictable, meaning 
that the excess losses are iid GPD. In the second model we consider the case of 
predictable marks. In this model the excess losses are conditionally generalized 
Pareto, given the exceedance history up to the time of the mark, with a scaling 
parameter that depends on that history. In this way we get a model where, in a 
period of excitement, both the temporal intensity of occurrence and the magnitude 
of the exceedances increase. 

In point process language, our models are processes N (-) on a state space of the 
form X = (0, n] x (u, co) such that N (A) = ri la, xpea} for sets A C X. To 
build these models we start with the intensity of the reparametrized version of the 
standard POT model given in (5.31). We recall that this model simply says that 
exceedances of the threshold u occur as a homogeneous Poisson process with rate t 
and that excesses have a generalized Pareto distribution with df G¢,g. 


Model with unpredictable marks. We first introduce the notation 
v*(t) = 5 A(t —T;, Xj —u) 
j: 0<Tj<t 


for the self-excitement function, where the function h is as in Section 16.2.1. We 
generalize (5.31) and consider a self-exciting model with conditional intensity 


T + you*(t) (: = 
B DF 


on a state space X = (0, n] x (u, oo), where t > 0 and y > 0. Effectively, we 
have combined the one-dimensional intensity in (16.7) with a GPD density. When 
w = 0 we have an ordinary POT model with no self-exciting structure. 

It is easy to calculate that the conditional rate of crossing the threshold x > u at 
time ¢, given information up to that time, is 


Mit,x) = 


(16.8) 


g X—-U ae 
ev=] ie nay= at (1+8 =) (16.9) 


which, for fixed x, is simply a one-dimensional self-exciting process of the form 
(16.7). The implied distribution of the excess losses when an exceedance takes place 
is generalized Pareto, because 


* —1/§ 
ee a (1 F =) = Ge p(x), (16.10) 


independently of t. Statistical fitting of this model is performed by maximizing a 
likelihood of the form 
n Nu 
L(0; data) = exp (-» — vf v*(s) as) I] rea Wie X;). (16.11) 
0 


j=l 
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A model with predictable marks. A model with predictable marks can be obtained 
by generalizing (16.8) to get 


t+ wur() (: pe Žo 


A” (t,x) = 
B+av*(t) B+av*(t) 


(16.12) 
where 6 > 0 anda > 0. For simplicity we have assumed that the GPD scaling is 
also linear in the self-excitement function v*(t). The properties of this model follow 
immediately from the model with unpredictable marks. The conditional crossing 
rate of the threshold x > u at time ¢ is as in (16.9) with the parameter $ replaced 
by the time-dependent self-exciting function 6 + av*(t). By repeating the calcu- 
lation in (16.10) we find that the distribution of the excess loss over the threshold, 
given that an exceedance takes place at time ¢ and given the history of exceedances 
up to time f, is generalized Pareto with df Ge, g+ay*(r). The likelihood for fitting 
the model is again (16.11), where the function A*(t, x) is now given by (16.12). 
Note that by comparing a model with œ = 0 and a model with a > 0 we can for- 
mally test the hypothesis that the marks are unpredictable using a likelihood ratio 
test. 


Example 16.12 (self-exciting POT model for S&P daily loss data). We continue 
the analysis of the data of Example 16.11 by fitting self-exciting POT models with 
both unpredictable and predictable marks to the 200 exceedances of the threshold 
u = 1.5%. The former is equivalent to fitting a self-exciting model to the exceedance 
times as in Example 16.11 and then fitting a GPD to the excess losses over the 
threshold; the estimated intensity of crossing the threshold is therefore identical 
to the one shown in Figure 16.1. The log-likelihood for this model is —783.4, 
whereas a model with predictable marks gives a value of —779.3 for one extra 
parameter œ; in a likelihood ratio test the p-value is 0.004, showing a significant 
improvement. 

In Figure 16.2 we show the exceedance data as well as the estimated intensity 
t*(t, u) of exceeding the threshold in the model with predictable marks. We also 
show the estimated mean of the GPD for the conditional distribution of the excess 
loss above the threshold, given that an exceedance takes place at time t. The GPD 
mean (8 + av*(t))/(1 — &) and the intensity t*(t, u) are both affine functions of 
the self-excitement function v* (t) and obviously follow its path. 


Calculating conditional risk measures. Finally, we note that self-exciting POT 
models can be used to estimate a conditional VaR and also a conditional expected 
shortfall. If we have analysed n daily data ending on day t and want to calculate, say, 
a 99% VaR, then we treat the problem as a (conditional) return-level problem; we 
look for the level at which the conditional exceedance intensity at a time point just 
after t (denoted by t+) is 0.01. In general, to calculate a conditional estimate of VaR/, 
(for a sufficiently large) we would attempt to solve the equation t* (t+, x) = (l—a@) 
for some value of x satisfying x > u. In the model with predictable marks this is 
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Figure 16.2. (a) Exceedance pattern for 200 largest daily losses in S&P data. (b) Estimated 
intensity of exceeding the threshold in a self-exciting POT model with predictable marks. 
(c) Mean of the conditional generalized Pareto distribution of the excess loss above the 
threshold. See Example 16.12 for details. 


possible if t + Yv*(t+) > 1 — a and gives the formula 


* _ =$ 
Ve cyp e l-a ) i); 
E T + Wou* (t+) 


The associated conditional expected shortfall could then be calculated by observing 
that the conditional distribution of excess losses above VaR{, given information up to 
time t is GPD with shape parameter £ and scaling parameter given by B+ av*(t+)+ 
E(VaR{ —u). 


Notes and Comments 


The original reference to the Hawkes self-exciting process is Hawkes (1971). There 
is a large literature on the application of such processes to earthquake modelling; 
a starter reference is Ogata (1988). To our knowledge, the earliest contribution 
on Hawkes processes in financial econometrics was Bowsher (2002); this paper 
finally appeared as Bowsher (2007). The application to extremes in financial time 
series was suggested in Chavez-Demoulin, Davison and McNeil (2005). Embrechts, 
Liniger and Lin (2011) give an application of multivariate Hawkes processes to 
multiple financial time series. Self-exciting processes have also been applied in 
credit risk (Errais, Giesecke and Goldberg 2010) and in the modelling of high- 
frequency financial data (Chavez-Demoulin and McGill 2012). 

The idea detailed in Section 16.2.2 (of a POT model with self-exciting structure) 
was first proposed in the first edition of this textbook. Grothe, Korniichuk and 
Manner (2014) extend the idea and consider multivariate models for extremes where 
component processes may excite themselves or other component processes. 


16.3. Multivariate Maxima 583 


16.3 Multivariate Maxima 


In this section we give a brief overview of the theory of multivariate maxima, stating 
the main results in terms of copulas. A class of copulas known as extreme value cop- 
ulas emerges as the class of natural limiting dependence structures for multivariate 
maxima. These provide useful dependence structures for modelling the joint tail 
behaviour of risk factors that appear to show tail dependence. A useful reference is 
Galambos (1987), which is one of the few texts to treat the theory of multivariate 
maxima as a copula theory (although Galambos does not use the word, referring to 
copulas simply as dependence functions). 


16.3.1 Multivariate Extreme Value Copulas 


Let X,,..., X, be iid random vectors in R? with joint df F and marginal dfs 
F,..., Fg. We label the components of these vectors X; = (Xj.1,..., Xi,q)’ and 
interpret them as losses of d different types. We define the maximum of the jth 
component to be My,; = max(X1,;,...,Xn,j), j = 1,...,d. In classical multi- 
variate EVT, the object of interest is the vector of componentwise block maxima: 
My = (Mn,1, - - -, Mn.a) . In particular, we are interested in the possible multivariate 
limiting distributions for M,, under appropriate normalizations, much as in the uni- 
variate case. It should, however, be observed that the vector M,, will in general not 
correspond to any of the vector observations X;. 
We seek limit laws for 
Mn = dn 2 (“= — dnt Mh, ai ana 


janes 
Cn Cn,1 Cn,d 


as n —> oo, where c&n = (Cn,1,-.-,Cn,d) and dy = (dn,1,...,dn qay are vec- 

tors of normalizing constants, the former satisfying c, > 0. Note that in this and 

other statements in this section, arithmetic operations on vectors of equal length are 

understood as componentwise operations. Supposing that (M„ — dn)/€n converges 

in distribution to a random vector with joint df H, we have 

lim p(=* < x) = lim F”(cax + dn) = H (x). (16.13) 
Cn n—>0oo 


noo 


Definition 16.13 (multivariate extreme value distribution and domain of attrac- 
tion). If (16.13) holds for some F and some H, we say that F is in the maximum 
domain of attraction of H, written F € MDA(#), and we refer to H as a multivariate 
extreme value (MEV) distribution. 


The convergence issue for multivariate maxima is already partly solved by the 
univariate theory. If H has non-degenerate margins, then these must be univariate 
extreme value distributions of Fréchet, Gumbel or Weibull type. Since these are con- 
tinuous, Sklar’s Theorem tells us that H must have a unique copula. The following 
theorem asserts that this copula C must have a particular kind of scaling behaviour. 


Theorem 16.14 (extreme value copula). If (16.13) holds for some F and some H 
with GEV margins, then the unique copula C of H satisfies 


C(u’) = C' (u), Yt >0. (16.14) 
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Any copula with the property (16.14) is known as an extreme value (EV) copula 
and can be the copula of an MEV distribution. The independence and comonotonicity 
copulas are EV copulas and the Gumbel copula provides an example of a parametric 
EV copula family. The bivariate version in (7.12) obviously has property (16.14), as 
does the exchangeable higher-dimensional Gumbel copula based on (7.46) as well 
as the non-exchangeable versions based on (15.10)—(15.12). 

There are a number of mathematical results characterizing MEV distributions and 
EV copulas. One such result is the following. 


Theorem 16.15 (Pickands representation). The copula C is an EV copula if and 
only if it has the representation 


C(u) {3( ual Iud ) s l | (16.15) 

u) = exp ees Nu; ft, $ 
Mai In uk et In uk i=1 

where B(w) = Ssa max(x,W ,...,XqWa) dS(x) and S is a finite measure on the d- 

dimensional simplex, i.e. the set 8g = {x : xi Z 0, i = 1,...,d, yey xj = I}. 


The function B(w) is sometimes referred to as the dependence function of the EV 
copula. In the general case, such functions are difficult to visualize and work with, 
but in the bivariate case they have a simple form that we discuss in more detail. 

In the bivariate case we redefine B(w) as a function of a scalar argument by 
setting A(w) := B((w, 1 — w)’) with w € [0, 1]. It follows from Theorem 16.15 
that a bivariate copula is an EV copula if and only if it takes the form 


l 
C(u1, u2) = exp | (nui + mupa — == — ) |, (16.16) 
Inu; + ln u2 


where A (w) = Jy max((1 — x)w, x(1 — w)) dS(x) for a measure S on [0, 1]. It can 
be inferred that such bivariate dependence functions must satisfy 


max(w, 1 — w) <S A(w) <1, O<w<l, (16.17) 


and that they must, moreover, be convex. Conversely, a differentiable convex func- 
tion A(w) satisfying (16.17) can be used to construct an EV copula using (16.16). 
The upper and lower bounds in (16.17) have intuitive interpretations. If A(w) = 1 
for all w, then the copula (16.16) is clearly the independence copula, and if A (w) = 
max(w, | — w), then it is the comonotonicity copula. It is also useful to note, and 
easy to show, that we can extract the dependence function from the EV copula 
in (16.16) by setting 
Aw) = -ln C(e™, e70), we [0,1]. (16.18) 
Example 16.16 (Gumbel copula). We consider the asymmetric version of the 
bivariate Gumbel copula defined by (7.12) and construction (15.10), i.e. the copula 
gris 
CH pi u2) = u uy” exp{—(—o Inu)? + (—B Inug)®)'/?}. 


As already remarked, this copula has the scaling property (16.14) and is an EV 
copula. Using (16.18) we calculate that the dependence function is given by 


A(w) = (1 —a@)w + (1 — A) — w) + (aw)? + (BU — w). 
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Figure 16.3. Plot of dependence functions for the (a) symmetric Gumbel, (b) asymmetric 
Gumbel, (c) symmetric Galambos and (d) asymmetric Galambos copulas (asymmetric cases 
have œ = 0.9 and 6 = 0.8) as described in Examples 16.16 and 16.17. Dashed lines show 
boundaries of the triangle in which the dependence function must reside; solid lines show 
dependence functions for a range of parameter values. 


We have plotted this function in Figure 16.3 for a range of 6 values running from 1.1 
to 10 in steps of size 0.1. Part (a) shows the standard symmetric Gumbel copula with 
a = ß = 1; the dependence function essentially spans the whole range from inde- 
pendence, represented by the upper edge of the dashed triangle, to comonotonicity, 
represented by the two lower edges of the dashed triangle, which comprise the func- 
tion A(w) = max(w, | — w). Part (b) shows an asymmetric example with a = 0.9 
and 6 = 0.8; in this case we still have independence when 0 = 1, but the limit as 
0 — oo is no longer the comonotonicity model. The Gumbel copula model is also 
sometimes known as the logistic model. 


Example 16.17 (Galambos copula). This time we begin with the dependence 
function given by 


Aw) = 1- (aw)? + BA - wy, (16.19) 
where 0 < a, 8 < landO < 0 < œ. It can be verified that this is a convex function 


satisfying max(w, 1 — w) < A(w) < 1 for 0 < w < 1, so it can be used to create 
an EV copula in (16.16). We obtain the copula 


CHA (ui, u2) = uyuz exp{((—a 1n u1)? + (—B 1n u2)®) 7"), 


which has also been called the negative logistic model. We have plotted this function 
in Figure 16.3 for a range of 0 values running from 0.2 to 5 in steps of size 0.1. 
Part (c) shows the standard symmetric case with a = 6B = 1 spanning the whole 
range from independence to comonotonicity. Part (d) shows an asymmetric example 
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with œ = 0.9 and 6 = 0.8; in this case we still approach independence as 6 —> 0, 
but the limit as 9 —> oo is no longer the comonotonicity model. 


A number of other bivariate EV copulas have been described in the literature (see 
Notes and Comments). 


16.3.2 Copulas for Multivariate Minima 


The structure of limiting copulas for multivariate minima can be easily inferred from 
the structure of limiting copulas for multivariate maxima; moving from maxima to 
minima essentially involves the same considerations that we made at the end of 
Section 5.1.1 and uses identity (5.2) in particular. 

Normalized componentwise minima of iid random vectors X|,..., Xn, with df 
F will converge in distribution to a non-degenerate limit if the df F of the random 
vectors —X,,..., —X,, is in the maximum domain of attraction of an MEV dis- 
tribution (see Definition 16.13), written Fe MDA(H). Of course, for a radially 
symmetric distribution, F coincides with F. 


Let M* be the vector of componentwise maxima of — X1, ..., —X, such that 
M% j = max(—X1j,..., —Xn, j). If Fe MDA(H) for some non-degenerate H, 
we have 

M-d, ZL 
lim p( a=" x x) = lim F"(e,x + d,) = H(x) (16.20) 
noo Ch n—> oo 


for appropriate sequences of normalizing vectors c, and d,, and an MEV distribu- 

tion H of the form H(x) = C(He,(x1),..., He, (xa)), where Ag, denotes a GEV 

distribution with shape parameter £; and C is an EV copula satisfying (16.14). 
Defining the vector of componentwise minima by m, and using (5.2), it follows 


from (16.20) that 
d 
lim p(™mat > x) = H(-x), 


noo Ch 


so normalized minima converge in distribution to a limit with survival function 
H(—x) = C(Hé,(—x1),..., He,(—xa)). It follows that the copula of the limiting 
distribution of the minima is the survival copula of C (see Section 7.1.5 for discussion 
of survival copulas). In general, the limiting copulas for minima are survival copulas 
of EV copulas and concrete examples of such copulas are the Gumbel and Galambos 
survival copulas. 

In the special case of a radially symmetric underlying distribution, the limiting 
copula of the minima is precisely the survival copula of the limiting EV copula of 
the maxima. 


16.3.3 Copula Domains of Attraction 


As in the case of univariate maxima we would like to know which underlying 
multivariate dfs F are attracted to which MEV distributions H. We now give a 
useful result in terms of copulas that is essentially due to Galambos (see Notes and 
Comments). 
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Theorem 16.18. Let F(x) = C(F\(%1),..., Fa(xa)) for continuous marginal dfs 
Fi, ..., Fq and some copula C. Let H (x) = Co(A\(%1),..., Ha(xa)) be an MEV 
distribution with EV copula Cg. Then F € MDA(H) if and only if F; € MDA(H;) 
forl <i < d and 


lim C'u, ... uY) = Colui, ..., ua), u €[0, 1]%. (16.21) 


This result shows that the copula Co of the limiting MEV distribution is determined 
solely by the copula C of the underlying distribution according to (16.21); the 
marginal distributions of F determine the margins of the MEV limit but are irrelevant 
to the determination of its dependence structure. This motivates us to introduce the 
concept of a copula domain of attraction. 


Definition 16.19. If (16.21) holds for some C and some EV copula Co, we say that 
C is in the copula domain of attraction of Co, written C € CDA (Co). 


There are a number of equivalent ways of writing (16.21). First, by taking log- 
arithms and using the asymptotic identity ln(x) ~ x — 1 as x — 1, we get, for 
u € (0, 11f, 


Jim 1(1 —Cuy",...,ug!")) = — 1n Colui, ..., ud), 
1 Cus, us (16.22) 
lim SEARA ERE E A 
s—>0t S 


By inserting u; = e™™ in the latter identity and using e7™5* ~ 1 — sx as s > 0, we 
get, for x € [0, œ0)f, 
. 1— C(1 —sxj,...,1—sxg) 
lim = 


s—>0+t S 


In Co(e™, ..., e7). (16.23) 


Example 16.20 (limiting copula for bivariate Pareto distribution). In Exam- 
ple 7.14 we saw that the bivariate Pareto distribution has univariate Pareto margins 
Fi(x) = 1 — (ki /(ki + x))®% and a Clayton survival copula. It follows from Exam- 
ple 5.6 that F; € MDA (Hi;a), i = 1, 2. Using (7.16), the Clayton survival copula 
is calculated to be C (u1, u2) = uy tux — 1 + ((1 — u1)! + (1 =) /%@ — 1). 
Using (16.23), it is easily calculated that Co(u1, u2) = u1u2 exp(((— Inuy)7!/% + 
(—Inu2)~!/“)-®), which is the standard exchangeable Galambos copula of Exam- 
ple 16.17. The limiting distribution of maxima therefore consists of two Fréchet dfs 
connected by the Galambos copula. 


The coefficients of upper tail dependence play an interesting role in the copula 
domain of attraction theory. In particular, they can help us to recognize copulas that 
lie in the copula domain of attraction of the independence copula. 


Proposition 16.21. Let C be a bivariate copula with upper tail-dependence coeffi- 
cient Ay, and assume that C satisfies C € MDA (Cọ) for some EV copula Co. Then 
Ay is also the upper tail-dependence coefficient of Co and is related to its dependence 
function by dy = 2(1 — A(3)). 
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Proof. We use (7.35) and (7.16) to see that 


. 1l-Cq,q) 
2 lim 3 
q>17 l—q q>17 l-q 


By using the asymptotic identity Inx ~ x — 1 as x — 1 and the CDA condi- 
tion (16.22) we can calculate 


.  1— Colq, q) .  InColq,q) 
lim = — = lim — 
q—>17 1- q q>17 lnq 
, .  1— Cq, q") 
= lim lim —— 
q—>17 s—>0" =y lnq 
i . 1-C(q’,q*) 
= lim lim —————— 
q—>17 s>0F — ln(q°) 
. 1-C(v, v) 
= lim —W—, 
v> 17 l—v 


which shows that C and Co share the same coefficient of upper tail dependence. 
Using the formula àu = 2—limy-,;- In Co(q, q)/ nq and the representation (16.16) 
we easily obtain that Ay = 2(1 — A(3)). 


In the case when Ay = 0 we must have A( 5) = 1, and the convexity of dependence 
functions dictates that A(w) is identically 1, so Co must be the independence copula. 
In the higher-dimensional case this is also true: if C is a d-dimensional copula with 
all upper tail-dependence coefficients equal to 0, then the bivariate margins of the 
limiting copula Co must all be independence copulas, and, in fact, it can be shown 
that Co must therefore be the d-dimensional independence copula (see Notes and 
Comments). 

As an example consider the limiting distribution of multivariate maxima of Gaus- 
sian random vectors. Since the pairwise coefficients of tail dependence of Gaussian 
vectors are 0 (see Example 7.38), the limiting distribution is a product of marginal 
Gumbel distributions. The convergence is extremely slow, but ultimately normalized 
componentwise maxima are independent in the limit. 

Now consider the multivariate ¢ distribution, which has been an important model 
throughout this book. If X1,..., Xn are iid random vectors with a tg(v, p, X) 
distribution, we know from Example 16.1 that univariate maxima of the individual 
components are attracted to univariate Fréchet distributions with parameter 1/v. 
Moreover, we know from Example 7.39 that tail dependence coefficients for the 
t copula are strictly positive; the limiting EV copula cannot be the independence 
copula. 

In fact, the limiting EV copula for t-distributed random vectors can be calculated 
using (16.23), although the calculations are tedious. In the bivariate case it is found 
that the limiting copula, which we call the t-EV copula, has dependence function 


(w/A—w))'/” — p (1 —w)/w)'/” — p 
+ (1 — wt ; 
va- Vd = p2)/(v an 


A(w) = wh ( 
24) 
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Figure 16.4. Plots of the dependence function for the t-EV copula for 
(a) v = 2, (b) v = 4, (c) v = 10 and (d) v = 20, and with various values of p. 


where p is the off-diagonal component of P = (X). This dependence function is 
shown in Figure 16.4 for four different values of v and for p values ranging from 
—0.5 to 0.9 with increments of 0.1. As o — 1 the t-EV copula converges to the 
comonotonicity copula; as o —> —1 or as v —> œ it converges to the independence 
copula. 


16.3.4 Modelling Multivariate Block Maxima 


A multivariate block maxima method analogous to the univariate method of Sec- 
tion 5.1.4 could be developed, although similar criticisms apply, namely that the 
block maxima method is not the most efficient way of making use of extreme data. 
Also, the kind of inference that this method allows may not be exactly what is desired 
in the multivariate case, as will be seen. 

Suppose we divide our underlying data into blocks as before and we denote 
the realizations of the block maxima vectors by My1,..., Mn,m, where m is the 
total number of blocks. The distributional model suggested by the univariate and 
multivariate maxima theory consists of GEV margins connected by an extreme value 
copula. 

In the multivariate theory there is, in a sense, a “correct” EV copula to use, which 
is the copula Co to which the copula C of the underlying distribution of the raw 
data is attracted. However, the underlying copula C is unknown and so the approach 
is generally to work with any tractable EV copula that appears appropriate for the 
task in hand. In a bivariate application, if we restrict to exchangeable copulas, then 
we have at our disposal the Gumbel, Galambos and t—-EV copulas, and a number of 
other possibilities for which references in Notes and Comments should be consulted. 
As will be apparent from Figures 16.3 and 16.4, the essential functional form of all 
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these families is really very similar; it is mostly sufficient to work with either the 
Gumbel copula or the Galambos copula as these have simple forms that permit a 
relatively easy calculation of the copula density (which is needed for likelihood 
inference). Even if the “true” underlying copula were f, it would seldom make sense 
to use the more complicated t—EV copula, since the dependence function in (16.24) 
can be accurately approximated by the dependence function of a Gumbel copula for 
many values of v and p. 

The Gumbel copula also allows us to explore the possibility of asymmetry by 
using the general non-exchangeable family described in Example 16.16. For appli- 
cations in dimensions higher than two, the higher-dimensional extensions of Gum- 
bel discussed in Sections 7.4.2 and 15.2.2 may be useful, although we should stress 
again that multivariate extreme value models are best suited to low-dimensional 
applications. 

Putting these considerations together, data on multivariate maxima could be mod- 
elled using the df Hg p,o,0 (x) = Co( He ,uy,0, (41), ---, Hég, ua,oa(Xa)) for some 
tractable parametric EV copula Cg. The usual method involves maximum likeli- 
hood inference and the maximization can either be performed in one step for all 
parameters of the margins and copula or broken into two steps, whereby marginal 
models are estimated first and then a parametric copula is fitted using the ideas in 
Sections 7.5.2 and 7.5.3. The following bivariate example gives an idea of the kind 
of inference that can be made with such a model. 


Example 16.22. Let M65,ı represent the quarterly maximum of daily percentage 
falls of the US dollar against the euro and let M65 2 represent the quarterly maximum 
of daily percentage falls of the US dollar against the yen. We define a stress event for 
each of these daily return series: for the dollar against the euro we might be concerned 
about a 4% fall in any one day; for the dollar against the yen we might be concerned 
about a 5% fall in any one day. We want to estimate the unconditional probability 
that one or both of these stress events occurs over any quarter. The probability p of 
interest is given by p = 1 — P(Mo65,1 < 4%, M65,2 < 5%) and is approximated by 
1— Hg „o,o (0.04, 0.05), where the parameters are estimated from the block maxima 
data. Of course, a more worrying scenario might be that both of these stress events 
should occur on the same day. To calculate the probability of simultaneous extreme 
events we require a different methodology, which is developed in Section 16.4. 


Notes and Comments 


Early works on distributions for bivariate extremes include Geffroy (1958), Tiago de 
Oliveira (1958) and Sibuya (1960). A selection of further important papers in the 
development of the subject include Galambos (1975), de Haan and Resnick (1977), 
Balkema and Resnick (1977), Deheuvels (1980) and Pickands (1981). The texts by 
Galambos (1987) and Resnick (2008) have both been influential; our presentation 
more closely resembles the former. 

Theorem 16.14 is proved in Galambos (1987): see Theorem 5.2.1 and Lemma 
5.4.1 therein (see also Joe 1997, p. 173). Theorem 16.15 is essentially a result of 
Pickands (1981). A complete version of the proof is given in Theorem 5.4.5 of 
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Galambos (1987), although it is given in the form of a characterization of MEV 
distributions with Gumbel margins. This is easily reformulated as a characterization 
of the EV copulas. In the bivariate case, necessary and sufficient conditions for A(w) 
in (16.16) to define a bivariate EV copula are given in Joe (1997, Theorem 6.4). 

The copula of Example 16.17 appears in Galambos (1975). A good summary 
of other bivariate and multivariate extreme value copulas is found in Kotz and 
Nadarajah (2000); they are presented as MEV distributions with unit Fréchet margins 
but the EV copulas are easily inferred from this presentation. See also Joe (1997, 
Chapters 5 and 6), in which EV copulas and their higher-dimensional extensions 
are discussed. Many parametric models for extremes have been suggested by Tawn 
(1988, 1990). 

Theorem 16.18 is found in Galambos (1987), where the necessary and sufficient 
copula convergence criterion is given as limno C” (u!/”) = Co(u) for positive 
integers n; by noting that for any t > O we have the inequalities 


Clit lq Nir < C! (ut^) < CH (u!/ +D), 


it can be inferred that this is equivalent to lim;—oo C t(u!/*) = Co(u). Further 
equivalent CDA conditions are found in Takahashi (1994). The idea of a domain of 
attraction of an EV copula also appears in Abdous, Ghoudi and Khoudraji (1999). 
Not every copula is in a copula domain of attraction; a counterexample may be 
found in Schlather and Tawn (2002). 

We have shown that pairwise asymptotic independence for the components of 
random vectors implies pairwise independence of the corresponding components 
in the limiting MEV distribution of the maxima. Pairwise independence for an 
MEV distribution in fact implies mutual independence, as recognized and described 
by a number of authors: see Galambos (1987, Corollary 5.3.1), Resnick (2008, 
Theorem 5.27), and the earlier work of Geffroy (1958) and Sibuya (1960). 


16.4 Multivariate Threshold Exceedances 


In this section we describe practically useful models for multivariate extremes 
(again in low-dimensional applications) that build on the basic idea of modelling 
excesses over high thresholds with the generalized Pareto distribution (GPD) as 
in Section 5.2. The idea is to use GPD-based tail models of the kind discussed in 
Section 5.2.3 together with appropriate copulas to obtain models for multivariate 
threshold exceedances. 


16.4.1 Threshold Models Using EV Copulas 


Assume that the vectors X1,..., Xn have unknown joint distribution F(x) = 
C(F\(x1),..-, Fa(%a)) for some unknown copula C and margins F),..., Fg, and 
that F is in the domain of attraction of an MEV distribution. Much as in the univari- 
ate case we would like to approximate the upper tail of F(x) above some vector of 
high thresholds u = (u,..., ug)’. The univariate theory of Sections 5.2.2 and 5.2.3 
tells us that, for x; > uj and u; high enough, the tail of the marginal distribution 
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F; may be approximated by a GPD-based functional form 
~ Xj — Uj ms 
Fj(xj) =1-A; ee ee ae f (16.25) 
J 


where Àj = F j(uj). This suggests that for x > u we use the approximation 
F(x) & C(F, (X1),---5 Fy (xq)). But C is also unknown and must itself be approx- 
imated in the tail. The following heuristic argument suggests that we should be able 
to replace C by its limiting copula Co. 

The CDA condition (16.21) suggests that for any value v € (0, 1)? and ¢ suffi- 
ciently large we may make the approximation C(v!/') ~ Cy ‘(v). If we now write 
w = v!/' we have 

C(w) © C3 (w') = Co(w) (16.26) 


by the scaling property of EV copulas. The approximation (16.26) will be best for 
large values of w, since v!/' + Last > oo. 

We assume then that we can substitute the copula C with its EV limit Co in the 
tail, and this gives us the overall model 


F(x) = Co(Fi(x1),.--, Fu(xa)), x >u. (16.27) 


We complete the model specification by choosing a flexible and tractable parametric 
EV copula for Co. As before, the Gumbel copula family is particularly convenient. 


16.4.2 Fitting a Multivariate Tail Model 


Assume we have observations X,,..., Xn from a df F with a tail that permits the 
approximation (16.27). Of these observations, only a minority are likely to be in the 
joint tail (x > u); other observations may exceed some of the individual thresholds 
but lie below others. The usual way of making inferences about all the parameters 
of such a model (the marginal parameters £j, 6j, àj for j = 1,...,d and the copula 
parameter (or parameter vector) 0 is to maximize a likelihood for censored data. 

Let us suppose that m; components of the data vector X; exceed their respective 
thresholds in the vector u. The only relevant information that the remaining compo- 
nents convey is that they lie below their thresholds; such a component X;, j is said 
to be censored at the value uj. The contribution to the likelihood of X; is given by 

mi 
Li = L(G, B20; Xj) = — , 
OXj, + OX jin, max(X;,u) 

where the indices ji, ..., jm; are those of the components of X; exceeding their 
thresholds. 

For example, in a bivariate model with Gumbel copula (7.12), the likelihood 
contribution would be 


ca gi Aq, 1 — A2), 


Xi 1 Sui, Xi2 

L= Co Fi Xa) | = DA XD, Xii >u, Xi2 
CRA — An, Fo(Xi,2)) fa(Xi.2), Xi 1 S u1, Xi,2 >un, 

Xi 


(EXD), Fo(Xi2)) Ai(Xi) fa(Xi2), Xia >u Xiz > ua, 
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Table 16.2. Parameter estimates and standard errors (in brackets) for a bivariate 
tail model fitted to exchange-rate return data; see Example 16.23 for details. 


$/€ SE 


0.75 1.00 
189 126 
0.094 (0.0065) 0.063 (0.0054) 
—0.049 (0.066) 0.095 (0.11) 
0.33 (0.032) 0.38 (0.053) 
1.10 (0.030) 


Dwr ~Z2e 


where fi denotes the density of the univariate tail model F j in (16.25), co (u1, u2) 
denotes the Gumbel copula density, and Ce (u1, u2) := (8/3u;)CF (u1, u2) 
denotes a conditional distribution of the copula, as in (7.17). The overall likelihood 
is a product of such contributions and is maximized with respect to all parameters 
of the marginal models and copula. 

In a simpler approach, parameters of the marginal GPD models could be estimated 
as in Section 5.2.3 and only the parameters of the copula obtained from the above 
likelihood. In fact, this is also a sensible way of getting starting values before going 
on to the global maximization over all parameters. 

The model described by the likelihood (16.28) has been studied in some detail 
by Ledford and Tawn (1996) and a number of related models have been studied in 
the statistical literature on multivariate EVT (see Notes and Comments for more 
details). 


Example 16.23 (bivariate tail model for exchange-rate return data). We analyse 
daily percentage falls in the value of the US dollar against the euro and the Japanese 
yen, taking data for the eight-year period 1996-2003. We have 2008 daily returns 
and choose to set thresholds at 0.75% and 1.00%, giving 189 and 126 exceedances, 
respectively. In a full maximization of the likelihood over all parameters we obtained 
the estimates and standard errors shown in Table 16.2. The value of the maximized 
log-likelihood is — 1064.7, compared with — 1076.4 in a model where independence 
in the tail is assumed (i.e. a Gumbel copula with 6 = 1), showing strong evidence 
against an independence assumption. 

We can now use the fitted model (16.27) to make various calculations about stress 
events. For example, an estimate of the probability that on any given day the dollar 
falls by more than 2% against both currencies is given by 


p12 := 1 — F, (2.00) — F (2.00) + C9" (F, (2.00), F2(2.00)) = 0.000315, 


with F j asin (16.25), making this approximately a 13-year event (assuming 250 trad- 
ing days per year). The marginal probabilities of falls in value of this magnitude are 
pi :=1— F, (2.00) = 0.0014 and p? := 1 — F>(2.00) = 0.0061. We can use this 
information to calculate so-called spillover probabilities for the conditional occur- 
rence of stress events; for example, the probability that the dollar falls 2% against 
the yen given that it falls 2% against the euro is estimated to be pj2/p1 = 0.23. 
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16.4.3 Threshold Copulas and Their Limits 


An alternative approach to multivariate extremes looks explicitly at the kind of 
copulas we get when we condition observations to lie above or below extreme 
thresholds. Just as the GPD is a natural limiting model for univariate threshold 
exceedances, so we can find classes of copula that are natural limiting models for 
the dependence structure of multivariate exceedances. 

The theory has been studied in most detail in the case of exchangeable bivariate 
copulas, and we concentrate on this case. Moreover, it proves slightly easier to 
switch our focus at this stage and first consider the lower-left tail of a probability 
distribution, before showing how the theory is adapted to the upper-right tail. 


Lower threshold copulas and their limits. Consider a random vector (X1, X2) 
with continuous margins F; and F> and an exchangeable copula C. We consider 
the distribution of (X1, X2) conditional on both being below their v-quantiles, an 
event we denote by Ay = {X1 < FÉ (v), X2 < Fy (v)}, 0 < v < 1. Assuming 
C(v, v) Æ 0, the probability that X; lies below its x;-quantile and X2 lies below its 
x2-quantile conditional on this event is 


C(x1, x2) 


P(X, < FÉ (x1), X2 < Fo Ay) = ———. 
(Xi 1 (x1), X2 2 (x2) | Av) CO, v) 


X1, X2 € [0, v]. 
Considered as a function of xı and x2 this defines a bivariate df on [0, vp, and by 
Sklar’s Theorem we can write 

C(x1, x2) 


C, v) = C}(Foy (a1), Fey (2), x1, x2 € [0, v], (16.29) 


for a unique copula C 9 and continuous marginal distribution functions 


C(x, v) 


F(x) = P(X1 < FÉ (x) | Av) = TaD 


Ocx Kv. (16.30) 


This unique copula may be written as 


CFG (ui), F&U) 


ra ; (16.31) 


C? (u1, u2) = 


and it will be referred to as the lower threshold copula of C at level v. Juri and 
Wüthrich (2002), who developed the approach we describe in this section, refer to 
it as a lower tail dependence copula. It is of interest to attempt to evaluate limits 
for this copula as v — 0; such a limit will be known as a limiting lower threshold 
copula. 

Much like the GPD in Example 5.19, limiting lower threshold copulas must pos- 
sess a stability property under the operation of calculating lower threshold copulas 
in (16.31). A copula C is a limiting lower threshold copula if, for any threshold 
0 < v < |l, it satisfies 


C? (u1, u2) = C (u1, u2). (16.32) 
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Example 16.24 (Clayton copula as limiting lower threshold copula). For the 
standard bivariate Clayton copula in (7.13) we can easily calculate that F) in (16.30) 
is 
(x78 4 vy? _ 1)71/6 
> 0 E 
(2v7? — 1)71/8 


Fox) = x <v, 


and its inverse is 
FEU) = uW? -1u A-O, Ocul. 


The lower threshold copula for the Clayton copula can therefore be calculated 
from (16.31) and it may be verified that this is again the Clayton copula. In other 
words, the Clayton copula is a limiting lower threshold copula because (16.32) 
holds. 


Upper threshold copulas. To define upper threshold copulas we consider again a 
random vector (X1, X2) with copula C and margins F; and F2. We now condition on 
the event Ay = {X1 > FÉ (v), X2 > Fy (v)} for0 < v < 1. We have the identity 
C(x1, x2) 
C(v, v) 

Since C(x1, x2)/C(v, v) defines a bivariate survival function on [v, 1]*, by (7.14) 
we can write 

C(x1, x2) 


P(X, > FE (x1), X2 > FS (x2) | Av) = , x1, X2 € [v, 1]. 


= = Ĉĉ! (Gw (x1), Gœ (&2)), xı, x2 € [v, 1], (16.33) 
C (v, v) 
for some survival copula Ĉĉ 1 of a copula C 5 and marginal survival functions 
C(x, v) 


Gy) = P(X > FE (x) | Ay) = 


z , ux. (16.34) 
Civ, v) 


The copula cs is known as the upper threshold copula at level v and it is now 
of interest to find limits as v > 1, which are known as limiting upper threshold 
copulas. In fact, as the following lemma shows, it suffices to study either lower or 
upper threshold copulas because results for one follow easily from results for the 
other. 


Lemma 16.25. The survival copula of the upper threshold copula of C at level v is 
the lower threshold copula of C at level 1 — v. 


Proof. We use the identity C(u, u2) = ĉa — u1, | — u2) and (16.34) to rewrite 
(16.33) as 
Candoo a(d Eaei) 
Cd —v,1—v) "LÊ -v,1-v) ĈO -—v,1-—v) 
Writing yı = 1 — x1, y2 = 1 — x2 and w = | — v we have 
Civ. y2) _ at (co w) C(w, y2) 
C(w, w) E Ĉ(w, w) C(w, w) 
and comparison with (16.29) and (16.30) shows that Ĉĉ a w must be the lower thresh- 
old copula of C at the level w = 1 — v. 


i: yı, y2 € [0, w], 
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It follows that the survival copulas of limiting lower threshold copulas are limiting 
upper threshold copulas. The Clayton survival copula is a limiting upper threshold 
copula. 


Relationship between limiting threshold copulas and EV copulas. We give one 
result that shows how limiting upper threshold copulas may be calculated for under- 
lying exchangeable copulas C that are in the domain of attraction of EV copulas 
with tail dependence, thus linking the study of threshold copulas to the theory of 
Section 16.3.3. 


Theorem 16.26. IfC is an exchangeable copula with upper tail-dependence coeffi- 

cient àu > 0 satisfying C € CDA(Co), then C has a limiting upper threshold copula 

that is the survival copula of the df 

(x1 + x2) — AQ /@1 + X2))) 
Au , 

Where A is the dependence function of Co. Also, C has a limiting lower threshold 

copula that is the copula of G. 


G(x}, x2) = (16.35) 


Example 16.27 (upper threshold copula of Galambos copula). We use this result 
to calculate the limiting upper threshold copula for the Galambos copula. We recall 
that this is an EV copula with dependence function given in (16.19) and consider the 
standard exchangeable case with a = 6 = 1. Using the methods of Section 7.2.4 it 
may easily be calculated that the coefficient of upper tail dependence of this copula 
is Ay = 27!/°. The bivariate distribution G(x1, x2) in (16.35) is therefore given by 


G(x, x2) = G07? +x, (1, x2) € (0, 1, 


the copula of which is the Clayton copula. The limiting upper threshold copula 
in this case is therefore the Clayton survival copula. Moreover, the limiting lower 
threshold copula of the Galambos survival copula is the Clayton copula. 


The Clayton copula turns out to be an important attractor for a large class of 
underlying exchangeable copulas. Juri and Wuthrich (2003) have shown that all 
Archimedean copulas whose generators are regularly varying at 0 with negative 
parameter (meaning that ¢ (t) satisfies lim;_,9 @(xt)/@(t) = x7“ for all x and some 
a > 0) share the Clayton copula C cl as their limiting lower threshold copula. 

It is of interest to calculate limiting lower and upper threshold copulas for the 
t copula, and this can be done using Theorem 16.26 and the expression for the 
dependence function in (16.24). However, the resulting limit is not convenient for 
practical purposes because of the complexity of this dependence function. We have 
already remarked in Section 16.3.4 that the dependence function of the -EV copula 
can be well approximated by the dependence functions of other exchangeable EV 
copulas, such as Gumbel and Galambos, for most practical purposes. Theorem 16.26 
therefore suggests that instead of working with the true limiting upper threshold 
copula of the t copula we could instead work with the limiting upper threshold 
copula of, say, the Galambos copula, i.e. the Clayton survival copula. Similarly, we 
could work with the Clayton copula as an approximation for the true limiting lower 
threshold copula of the ¢ copula. 
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Limiting threshold copulas in practice Limiting threshold copulas in dimensions 
higher than two have not yet been extensively studied, nor have limits for non- 
exchangeable bivariate copulas or limits when we define two thresholds vı and v2 
and let these tend to zero (or one) at different rates. The practical use of these ideas 
is therefore largely confined to bivariate applications when thresholds are set at 
approximately similar quantiles and a symmetric dependence structure is assumed. 

Let us consider a situation where we have a bivariate distribution that appears to 
exhibit tail dependence in both the upper-right and lower-left corners. While true 
lower and upper limiting threshold copulas may exist for this unknown distribution, 
we could in practice simply adopt a tractable and flexible parametric limiting thresh- 
old copula family. It is particularly easy to use the Clayton copula and its survival 
copula as lower and upper limits, respectively. 

Suppose, for example, that we set high thresholds atu = (u1, u2)’, sothat P(X, > 
u1) © P(X2 > u2) and both probabilities are small. For the conditional distribution 
of (X1, X2) over the threshold u we could assume a model of the form 


P(X <x | X > u) © CO(Ge, (41 — u1), Ga, p (42 — u2)), x >u, 


where ce is the Clayton survival copula and G¢,,g, denotes a GPD, as defined 
in 5.16. Inference about the model parameters (0, £1, 61, 2, 62) would be based on 
the exceedance data above the thresholds and would use the methods discussed in 
Section 7.5. 

Similarly, for a vector of low thresholds u satisfying P(X, < u1) © P(X2 < u2) 
with both these probabilities small, we could approximate the conditional distribu- 
tion of (X1, X2) below the threshold u by a model of the form 


P(X <x | X < u) C5'(Ge,g,(u1 — x1), Gep (u2 — x2)), x <u, 


where C a is the Clayton copula and G; j,ß; denotes a GPD survival function. Infer- 
ence about the model parameters would be based on the data below the thresholds 
and would use the methods of Section 7.5. 


Note and Comments 


The GPD-based tail model (16.27) and inference for censored data using a likeli- 
hood of the form (16.28) have been studied by Ledford and Tawn (1996), although 
the derivation of the model uses somewhat different asymptotic reasoning based on 
a characterization of multivariate domains of attraction of MEV distributions with 
unit Fréchet margins found in Resnick (2008). The authors of the former paper con- 
centrate on the model with Gumbel (logistic) dependence structure and discuss, in 
particular, testing for asymptotic independence of extremes. Likelihood inference is 
non-problematic (the problem being essentially regular) when 6 > 0 and j > — $, 
but testing for independence of extremes 0 = | is not quite so straightforward since 
this is a boundary point of the parameter space. This case is possibly more interest- 
ing in environmental applications than in financial ones, where we tend to expect 
dependence of extreme values. 
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A related bivariate GPD model is presented in Smith, Tawn and Coles (1997). In 
our notation they essentially consider a model of the form 


F(a, ...,xa) = 1+InCo(e?@)-!,...,eF 0-1), x > k, 


where Co is an extreme value copula. This model is also discussed in Smith (1994) 
and Ledford and Tawn (1996); it is pointed out that F does not reduce to a product 
of marginal distributions in the case when Co is the independence copula, unlike the 
model in (16.27). 

Another style of statistical model for multivariate extremes is based on the point 
process theory of multivariate extremes developed in de Haan (1985), de Haan and 
Resnick (1977) and Resnick (2008). Statistical models using this theory are found 
in Coles and Tawn (1991) and Joe, Smith and Weissman (1992); see also the texts of 
Joe (1997) and Coles (2001). New approaches to modelling multivariate extremes 
can be found in Heffernan and Tawn (2004) and Balkema and Embrechts (2007); 
the latter paper considers applications to stress testing high-dimensional portfolios 
in finance. 

Limiting threshold copulas are studied in Juri and Wiuthrich (2002, 2003). In 
the latter paper it is demonstrated that the Clayton copula is an attractor for the 
threshold copulas of a wide class of Archimedean copulas; moreover, a version of our 
Theorem 16.26 is proved. Limiting threshold copulas for the t copula are investigated 
in Demarta and McNeil (2005). The usefulness of Clayton’s copula and the Clayton 
survival copula for describing the dependence in the tails of bivariate financial return 
data was confirmed in a large-scale empirical study of high-frequency exchange-rate 
returns by Breymann, Dias and Embrechts (2003). 
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Dynamic Portfolio Credit Risk Models and 
Counterparty Risk 


This chapter is concerned with dynamic reduced-form models of portfolio credit risk. 
It is also the natural place for an analysis of counterparty credit risk for over-the- 
counter (OTC) credit derivatives, since this risk can only be satisfactorily modelled 
in the framework of dynamic models. 

In Section 17.1 we give an informal introduction to the subject of dynamic models 
in which we explain why certain risk-management tasks for portfolios of credit 
derivatives cannot be properly handled in the copula framework of Chapter 12; we 
also give an overview of the different types of dynamic model used in portfolio 
credit risk. 

A detailed analysis of counterparty credit risk management is presented in Sec- 
tion 17.2, while the remainder of the chapter focusses on two different approaches 
to dynamic modelling. In Section 17.3 we consider dynamic models with condition- 
ally independent default times, and in Section 17.4 we treat credit risk models with 
incomplete information. 

We use a number of concepts and techniques from continuous-time finance and 
build on material in Chapters 10 and 12. Particular prerequisites for reading this 
chapter are Sections 10.5 and 10.6 on pricing single-name credit derivatives in 
models with stochastic hazard rates and Section 12.3 on CDO pricing in factor 
copula models. 


17.1 Dynamic Portfolio Credit Risk Models 
17.1.1 Why Dynamic Models of Portfolio Credit Risk? 


In the copula models of portfolio credit risk in Chapter 12, the joint distribution of the 
default times was specified directly. However, the evolution of this distribution over 
time, for instance in reaction to new economic information, was not modelled. While 
the copula approach is sufficient for computing the prices of many credit products 
such as index swaps or CDO tranches at a given point in time f, it is not possible to 
say anything about the dynamics of these prices over time. Certain important tasks 
in the risk management of credit derivative portfolios cannot therefore be handled 
properly in the copula framework, as we now discuss. 

To begin with, in a copula model it is not possible to price options on credit-risky 
instruments such as options on a basket of corporate bonds. This is an issue of 
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practical relevance, since the task of pricing an option on a credit-risky instrument 
arises in the management of counterparty credit risk management, particularly in 
the computation of a so-called credit value adjustment (CVA) for a credit derivative. 
Consider, for instance, a protection buyer who has entered into a CDS contract with 
some protection seller. Suppose now that the protection seller defaults during the 
lifetime of the CDS contract and that the credit spread of the reference entity has 
gone up. In that case the protection buyer suffers a loss, since in order to renew his 
protection he has to enter into a new CDS at a higher spread. In order to account for 
this loss the protection buyer should make an adjustment to the value of the CDS. 
We will show in Section 17.2 that, roughly speaking, this adjustment takes the form 
of an option on the value of the future cash flows of the CDS with maturity date 
equal to the default time of the protection seller. 

Moreover, in copula models it is not possible to derive model-based dynamic 
hedging strategies. For this reason risk managers often resort to sensitivity-based 
hedging strategies that are similar to the use of duration-based immunization for 
the risk management of bond portfolios. This is not entirely satisfactory, since it 
is known from markets for other types of derivatives that hedging strategies that 
lack a proper theoretical foundation may perform poorly (see Notes and Comments 
for references). On the theoretical side, the non-existence of model-based hedging 
strategies implies that pricing results in copula models are not supported by hedg- 
ing arguments, so that an important insight of modern derivative asset analysis is 
ignored in standard CDO pricing. Of course, dynamic trading strategies are sub- 
stantially more difficult to implement in credit-derivative markets than in equity 
markets and their performance is less robust with respect to model misspecification. 
However, in our view this should not serve as an excuse for neglecting the issue of 
dynamic modelling and dynamic hedging altogether. A number of recent papers on 
the hedging of portfolio credit derivatives are listed in Notes and Comments. 


17.1.2 Classes of Reduced-Form Models of Portfolio Credit Risk 


The properties of a reduced-form model of portfolio credit risk are essentially deter- 
mined by the intensities of the default times. Different models can be classified 
according to the mathematical structure given to these intensities. The concept of 
the intensity of a random default time was encountered in Definition 10.15 and will 
be explained in more detail in Section 17.3.1. 

The simplest reduced-form portfolio models are models with conditionally inde- 
pendent defaults (see Section 17.3). These models are a straightforward multi-firm 
extension of models with doubly stochastic default times. Default times are mod- 
elled as conditionally independent given the realization of some observable eco- 
nomic background process (WY). The default intensities are adapted to the filtration 
generated by (W,), and dependence between defaults is generated by the mutual 
dependence of the default intensities on the common background process. An impor- 
tant special case is the class of models with Markov-modulated intensities where 
Àr i = Ai (%) and (WY) is Markovian. 
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In Section 17.4 we consider reduced-form models with incomplete information. 
We consider a set-up where the default times are independent given a common 
factor. In the simplest case this factor is simply a random variable V, as in the 
factor copula models studied in Section 12.2.2. Assume for the moment that V is 
observable. Then defaults are independent and the default intensities have the simple 
form ki i = yi (V, t) for suitable functions y;. We assume, however, that the factor 
is not directly observable. Instead investors are only able to observe the default 
history and, at most, an auxiliary process representing noisy observations of V. In 
this context it will be shown that the default intensities are computed by projection. 
Denoting the default intensity of firm i at time t by A;,; and the information available 
to investors at t by the o-field %,, we have that 


Ati = E (V, t) | Gi). (17.1) 


Moreover, prices of many credit derivatives are given by similar conditional expec- 
tations. We will see that Bayesian updating and, more generally, stochastic filtering 
techniques can be employed in the evaluation of expressions of the form (17.1) and 
in the analysis of the model in general. 

A further important model class comprises models with interacting intensities. We 
will not present these in any detail for reasons of space but important contributions 
to the literature are listed in Notes and Comments. In these models the impact of the 
default of one firm on the default intensities of surviving firms is specified explicitly. 
For instance, we might assume that, after a default event, default intensities increase 
by 10% of their pre-default value. This interaction among intensities provides an 
alternative mechanism for creating dependence between default events. In formal 
terms, models with interacting intensities are constructed as Markov chains on the set 
of all possible default states of the portfolio. The theory of Markov chains therefore 
plays an important role in their analysis. 

A common feature of models with interacting default intensities and models with 
incomplete information is the presence of default contagion. This means that the 
default intensity of a surviving firm jumps (usually upwards) given the information 
that some other firm has defaulted. As a consequence, the credit spread of surviving 
firms increases when default events occur in the portfolio. Default contagion can 
arise via different channels. On the one hand, contagious effects might be due to 
direct economic links between firms, such as a close business relationship or a 
strong borrower—lender relationship. For instance, the default probability of a bank 
is likely to increase if one of its major borrowers defaults. Broadly speaking, this 
channel of default interaction is linked to counterparty risk. On the other hand, 
changes in the conditional default probability of non-defaulted firms can be caused 
by information effects; investors might revise their estimate of the financial health 
of non-defaulted firms in light of the news that a particular firm has defaulted. This 
is known as information-based default contagion. Note that models with incomplete 
information generate information-based default contagion by design. At a default 
event the conditional distribution of the factor V given the investor information 
is updated; by (17.1) this leads to a jump in the default intensity A; ;. There is in 
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fact substantial evidence for contagion effects. A good example is provided by the 
default of the investment bank Lehman Brothers in autumn 2008; the default event 
combined with the general nervousness caused by the worsening financial crisis sent 
credit spreads to unprecedentedly high levels. 

We can also distinguish between bottom-up and top-down models of portfolio 
credit risk. This distinction relates to the quantities that are modelled and cuts through 
all types of reduced-form portfolio credit risk models including the copula models 
of Chapter 12. 

The fundamental objects that are modelled in a bottom-up model are the default 
indicator processes of the individual firms in the portfolio under consideration; the 
dynamics of the portfolio loss are then derived from these. In this approach it is 
possible to price portfolio products consistently with observed single-name CDS 
spreads and to derive hedging strategies for portfolio products that use single-name 
CDSs as hedging instruments. These are obvious advantages of this model class. 
However, there are also some drawbacks related to tractability: in the bottom-up 
approach we have to keep track of all default-indicator processes and possibly also 
background processes driving the model. This typically leads to substantial com- 
putational challenges in pricing and model calibration, particularly if the portfolio 
size is fairly large. 

In top-down models, on the other hand, the portfolio loss process is modelled 
directly, without reference to the individual default indicator processes. This obvi- 
ously drastically reduces the dimensionality of the resulting models. It can be argued 
that top-down models are sufficient for the pricing of index derivatives, since the pay- 
off of these contracts depends only on the portfolio loss. However, in this model class 
the information contained in single-name spreads is neglected for pricing purposes, 
and it is not possible to study the hedging of portfolio derivatives with single-name 
CDSs. 

There is no obvious and universally valid answer to the question of which model 
class should be preferred; in Notes and Comments we provide a few references 
in which this issue is discussed further. In our own analysis we concentrate on 
bottom-up models. 


Notes and Comments 


The limitations of static copula models are discussed in a number of research papers; 
a particularly readable contribution is Shreve (2009). An interesting collection of 
papers that deal with portfolio credit risk models “after copulas” is found in Lipton 
and Rennie (2008). Dynamic hedging strategies for portfolio credit derivatives are 
studied by Frey and Backhaus (2010), Laurent, Cousin and Fermanian (2011) and 
Cont and Kan (2011), among others; an earlier contribution is Bielecki, Jeanblanc 
and Rutkowski (2004). A detailed mathematical analysis of hedging errors for equity 
and currency derivatives is given in El Karoui, Jeanblanc-Picqué and Shreve (1998). 

There is arich literature on models with interacting intensities. Bottom-up models 
are considered by Davis and Lo (2001), Jarrow and Yu (2001), Yu (2007), Frey 
and Backhaus (2008) and Herbertsson (2008). Top-down models with interacting 
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intensities include the contributions by Arnsdorf and Halperin (2009) and Cont and 
Minca (2013). Moreover, there are top-down models where the dynamics of the 
whole “surface” of CDO tranche spreads—that is the dynamics of CDO spreads for 
all maturities and attachment points—are modelled directly: see, for example, Ehlers 
and Schonbucher (2009), Sidenius, Piterbarg and Andersen (2008) and Filipovic, 
Overbeck and Schmidt (2011). The modelling philosophy of these three papers is 
akin to the well-known HJM models for the term structure of interest rates. A general 
discussion of the pros and cons of bottom-up and top-down models can be found 
in Bielecki, Crépey, and Jeanblanc (2010) (see also Giesecke, Goldberg and Ding 
2011). 

Credit risk models with explicitly specified interactions between default intensi- 
ties are conceptually close to network models and to models for interacting particle 
systems developed in statistical physics. Follmer (1994) contains an inspiring dis- 
cussion of the relevance of these ideas to financial modelling; the link to credit risk is 
explored by Giesecke and Weber (2004, 2006) and Horst (2007). Network models 
are frequently used for the study of systemic risk in financial networks, an issue 
that has become highly relevant in the aftermath of the financial crisis of 2007-9. 
Interesting contributions in this rapidly growing field include the papers by Eisen- 
berg and Noe (2001), Elsinger, Lehar and Summer (2006), Gai and Kapadia (2010), 
Upper (2011) and Amini, Cont and Minca (2012). 

There are some empirical papers on default contagion. Jarrow and Yu (2001) pro- 
vide anecdotal evidence for counterparty-risk-related contagion in small portfolios. 
In contrast, Lando and Nielsen (2010) find no strong empirical evidence for default 
contagion in historical default patterns. 

Other work has tested the impact of the default or spread widening of a given firm 
on the credit spreads or stock returns of other firms: see Collin-Dufresne, Goldstein 
and Helwege (2010) or Lang and Stulz (1992). The evidence in favour of this type 
of default contagion is quite strong. For instance, Collin-Dufresne, Goldstein and 
Helwege (2010) found that, even after controlling for other macroeconomic variables 
influencing bond returns, the returns of large corporate bond indices in months where 
one or more large firms experienced a significant widening in credit spreads (above 
200 basis points) were significantly lower than the returns of these indices in other 
months; this is clear evidence supporting contagion. 


17.2 Counterparty Credit Risk Management 


A substantial proportion of all derivative transactions are carried out OTC, so that 
counterparty credit risk is a key issue for financial institutions. The management of 
counterparty risk poses a number of challenges. To begin with, a financial institution 
needs to measure (in close to real time) its counterparty risk exposure to its various 
trading partners. Moreover, counterparty risk needs to be taken into account in the 
pricing of derivative contracts, which leads to the issue of computing credit value 
adjustments. Finally, financial institutions and major corporations apply various risk- 
mitigation strategies in order to control and reduce their counterparty risk exposure. 
In particular, many OTC derivative transactions are now collateralized. 
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Consider a derivative transaction such as a CDS contract between two parties— 
the protection seller S and the protection buyer B—and suppose that the deal is 
collateralized. If the value of the CDS is negative for, say, S, then S passes cash or 
securities (the collateral) to B. If S defaults before the maturity of the underlying 
CDS and if the value of the CDS at the default time ts of S is positive for B, the 
protection buyer is permitted to liquidate the collateral in order to reduce the loss 
due to the default of S; excess collateral must be returned to S. Most collateralization 
agreements are symmetric so that the roles of S and B can be exchanged when the 
value of the underlying CDS changes its sign. 

In this section we study quantitative aspects of counterparty risk management. 
In Section 17.2.1 we introduce the general form of credit value adjustments for 
uncollateralized derivative transactions and we discuss various simplifications that 
are used in practice. In Section 17.2.2 we consider the case of collateralized transac- 
tions. For concreteness, we discuss value adjustments and collateralization strategies 
for a single-name CDS, but our arguments apply to other contracts as well. 


17.2.1 Uncollateralized Value Adjustments for a CDS 


We begin with an analysis of the form of credit value adjustments for an uncollater- 
alized single-name CDS contract on some reference entity R. We work on a filtered 
probability space (2, 3, (Gr), Q), where Q denotes the risk-neutral measure used 
for pricing derivatives and where the filtration (r) represents the information avail- 
able to investors. Our notation is as follows: the default times of the protection seller 
S, the protection buyer B and the reference entity R are denoted by the (G,) stopping 
times ts, tg and tr; 6*, 65 and 68 are the losses given default (LGDs) of the con- 
tracting parties; 7; = min{Tp, Ts, Tg} denotes the first default time; &; € {R, S, B} 
gives the identity of the firm that defaults first. We assume that 6*, 5° and 5° are 
constant; for a discussion of the calibration of these parameters in the context of 
counterparty credit risk, we refer to Gregory (2012). 

The CDS contract referencing R has premium payment dates t1 <--- < ty =T, 
where f; is greater than the current time f and a fixed spread x. The default-free short 
rate is given by the (G;)-adapted process (r;). In discounting future cash flows it 
will be convenient to use the abbreviation 


s2 
D(s1, 52) = exp (-f Ty au), OSs <n <T. 
Si 


The promised cash flow of a protection buyer position in the CDS between two time 
points sı < s2, discounted back to time sı, will be denoted by JI (s1, s2). Ignoring 
for simplicity accrued premium payments, we therefore have 


s2 
Msi. 50) = f D(s1,u)5" Yru ~x XO DCs, Yran) (17.2) 
S| 


tmE(s1,52] 


The first term on the right-hand side of (17.2) represents the discounted default 
payment, and the second term corresponds to the discounted premium payment. 
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The value at some stopping time t > t of the promised cash-flow stream for B is 
then given by 


V: := E (II (t, T) | $x); (17.3) 


sometimes we will call (V;) the counterparty-risk-free CDS price. The discounted 
cash flows that are made or received by B over the period (s1, s2] (the real cash 
flows) are denoted by 771"™°°! (s1, s2). Note that IT and J7'@! are in general different 
as S or B might default before the maturity date T of the transaction. 

In order to describe /7'**! we distinguish the following scenarios. 


e If 7, > T orif Tı < T and & = R, that is if both S and B survive until the 
maturity date of the CDS, the actual and promised cash-flow streams coincide, 
so that 7°(., T) = I7(-, T). 

e Consider next the scenario where T < T and é, = S, that is the protection 
seller defaults first and this occurs before the maturity date of the CDS. In 
that case, prior to T;, actual and promised cash flows coincide. At T}, if the 
counterparty-risk-free CDS price V7, is positive, B is entitled to charge a 
close-out amount to S in order to settle the contract. Following the literature 
we assume that this close-out amount is given by V7,. However, S is typically 
unable to pay the close-out amount in full, and B receives only a recovery 
payment of size (1 — 65 )Vr,. If, on the other hand, Vr, is negative, B has 
to pay the full amount |V7,| to S. Using the notation xt = max{x, 0} and 
x” = —min{x, 0}, the actual cash flows on the set {Ti < T} {& = S} are 
given by 


mç, T) = TI (t, T) + D(t, TO(( — 8S) VE = Vid. (17.4) 


e Finally, consider the scenario where T; < T and &; = B. If Vr, > 0, S has to 
pay the full amount V7, to B; if Vr, < 0, the protection buyer makes a recovery 
payment of size (1 — 56?) | Vr, | to S. Thus, on the set {7; < T}N{& = B} we 
have 


n= ¢ T) = 1G, TO) + Dt, TOVA —(1— 8P) V). (17.5) 


The correct value of the protection-buyer position in the CDS in the presence of 
counterparty risk is given by E QT, T) | 91). For t < T; the difference 


BCVA, := EÈ (II (t, T) | Ge) — ECU", T) | Ge) (17.6) 


is known as the bilateral credit value adjustment (BCVA) at time t. Note that 
EQ (I1! (t, T) | G1) = V; — BCVA;: that is, BCVA; is the adjustment that needs 
to be made to the counterparty-risk-free CDS price in order to obtain the value of 
the cash-flow stream J7™°®, The term bilateral refers to the fact that the value adjust- 
ment takes the possibility of the default of both contracting parties, B and S, into 
account. By definition, the bilateral value adjustment is symmetric in the sense that 
the value adjustment computed from the viewpoint of the protection seller at time 
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t is given by — BCVA;; this is obvious since the cash-flow stream received by S 
is exactly the negative of the cash-flow stream received by B. This contrasts with 
so-called unilateral value adjustments where each party neglects the possibility of 
its own default in computing the adjustment to the value of the CDS. 

The next proposition gives a more succinct expression for the BCVA. 


Proposition 17.1. Fort < Tı we have that BCVA; = CVA; — DVA;, where 


CVA; = EL (hr, <r) l= Dt, T)5° V7, | Gr), (17.7) 
DVA, = EL (lir <r le=8 DC, Ti)? Vz, | Gr). (17.8) 


Comments. The CVA in (17.7) reflects the potential loss incurred by B due to a 
premature default of S; the debt value adjustment, or DVA, in (17.8) reflects the 
potential loss incurred by S due to a premature default of B. A similar formula 
obviously holds for other products; the only part that needs to be adapted is the 
definition of the promised cash-flow stream in (17.2). 

Accounting rules require that both CVA and DVA have to be taken into account if 
an instrument is valued via mark-to-market accounting techniques. Note, however, 
that the use of DVA is somewhat controversial for the following reason: a decrease 
in the credit quality of B leads to an increase in the probability that B defaults first 
and hence to a larger DVA term. If both CVA and DVA are taken into account in 
the valuation of the CDS, this would be reported as a profit for B. It is not clear, 
however, how B could turn this accounting profit into an actual cash flow for its 
shareholders. 

Proposition 17.1 shows that the problem of computing the BCVA amounts to 
computing the price of a call option and a put option on (V;) with strike K = 0 
and random maturity date Tı. The computation of the value adjustments is therefore 
more involved than the pricing of the CDS itself, and a dynamic portfolio credit 
model is needed to compute the value adjustment in a consistent way. The actual 
computation of value adjustments depends on the structure of the underlying credit 
model. For further information we refer to Sections 17.3.3 and 17.4.4. 


Proof of Proposition 17.1. Fort < T; < s it holds that 
Dt, s) = D(t, T1)D(T1, s). 


Hence, on the set {7; < T} we may write M(t, T) = H(t, 71) + D(t, DHI (T, T). 
This yields 


M(t, T) — IY, T) = kr<ryD@, TOT, T) — lg-s — 85) VE — Va) 
— Ig=a)(Vz — (1 — 8?) Vi). 
By iterated conditional expectations it follows that 


EL (I(t, T) — M(t, T)) = E? (ELIG, T) — I}, T) | Gr,)). (17.9) 
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We concentrate on the inner conditional expectation. Since D(t, T1) and the events 
{Ti < T}, {& = S} and {&; = B} are $7,-measurable, we obtain 


EL (M(t, T) — W(t, T) | Gr) (17.10) 
= Kr, <rye=s)D@, TE? A(T, T) — (1 — 8°) Vf — Vz) | Gry) 

(17.10 b) 

+ In<ryle=8)D(t, TEIT, T) — (Vt — (1 — 87) VR) | Gr). 

(17.10 c) 


Now, by the definition of (V;) we have that £ Q(T (1, T) | $r,) = Vr,. Moreover, 

we use the decomposition V7, = Vr T, — Vr,» Where Va and V7, are $7, measurable. 

Hence (17.10 b) panes krsty ess, TSV a and, sfiia, (17.10 c) equals 
=i kes =B)ô? Vr: Putting these together, (17. 10 a) is equal to 


Ins DG, Ti) g=s)8 VE — le=B) D(t, T1)8” Vi), 


and substituting this into (17.9) gives the result. 


Simplified value adjustments and wrong-way risk. In order to simplify the com- 
putation of value adjustments, it is often assumed that (Y, s), (Y;,B) and the 
counterparty-risk free CDS price (V;) are independent stochastic processes and that 
the risk-free interest rate is deterministic. We now explain how the value adjustment 
formulas (17.7) and (17.8) simplify under this independence assumption. For sim- 
plicity we consider the case t = 0. Denote by Fs (t), Fp (t), fs(t) and f(t) the sur- 
vival functions and densities of ts and tg. Since {£1 = S} = {ts < Tp} NO {ts < TR} 
and since Vr, = Vz; on {£1 = S}, we obtain 


CVA = CVAo = E2 (Urs cry les <rp} lcs <te}D (0, t5)8° V) 
T 
= 85 f EO Cas 19) DO. TOV |ts = DAO dt. 
0 


In the last line we have used the fact that 5° is deterministic and we have used the 
identity V; = 0 on {tr < s}, which allows the indicator Tirs <rg} to be dropped. The 
independence of the processes (Y; s), (Y,8), (V+) and the fact that interest rates are 
deterministic imply that 


E2 (Itzs<rg}D(O, ts) V$ | ts = t) = EÈ (ucr) DO, t) V) 
= Fg(t)DO, t)E? (V7). 


Hence CVA‘"°P, the credit value adjustment at t = O under the independence 
assumption, is given by 


T 
cvaineer = CVA? — 55 f FB) DO, thE2(V,*) fs(t) dt, (17.11) 
0 
and, by a similar argument, the debt value adjustment under independence is 


. i T Ta 
DvAindep — pyar? = a f Fs(t)D(O, NEL (V7) fg ©) dt. (17.12) 
0 
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Note that formulas (17.11) and (17.12) are much easier to evaluate than the “cor- 
rect” expressions (17.7) and (17.8). In particular, we only need to determine the 
marginal distribution of ts and tg and the so-called expected exposures EL (V) 
and E2(V,). On the other hand, the assumption that (V;), (Y,5) and (Y, g) are 
independent is often difficult to justify, and the simplified adjustments can be mis- 
leading. Consider, for instance, the case where S, R and B are financial institutions 
and suppose that 7; < T and &; = S. In that case it is quite likely that ts falls in 
a time period where financial institutions face adverse conditions so that the credit 
spread of the reference entity at ts and, hence, the market value Vz, of the CDS 
referencing R are comparatively high. We therefore expect that 


Boe | ts = t) > EL (V), 


so that CVA > CVAineP, Similarly, we expect that E eov: | tR =t)< E 2(V,), 
so that DVA < DVA'"“©P, The aggregate effect would be that 


BCVA > BCVAi"deP 


in that case. Some numerical results that support this intuition are given in Sec- 
tion 17.4.4. The phenomenon whereby the conditional expected exposure given the 
default of the counterparty is higher than the unconditional expected exposure is 
a typical example of an unfavourable dependence between the size of an exposure 
and the credit quality of the counterparty. In counterparty risk management, such 
an unfavourable dependence is known as wrong-way risk (since the exposure to a 
counterparty and the credit quality of that party evolve in the “wrong way”). Wrong- 
way risk is an important issue in counterparty risk management (see, for example, 
Chapter 15 of Gregory (2012)). 


Unilateral credit value adjustments. In a unilateral value adjustment each party 
neglects the possibility of its own default. The unilateral value adjustment for the 
protection buyer B is therefore obtained from the formula for the bilateral value 
adjustment by assuming that Q(&| = B) = 0. This gives 


UCVA, = EÈ (Irs) D(t, ts) 8° VE | Gr). 


An analogous formula holds for the unilateral value adjustment of the protection 
seller. Unilateral value adjustments avoid the problem that a worsening credit spread 
of a financial institution leads to an accounting profit. On the other hand, if B and S 
use unilateral adjustments, they might come to different conclusions about the value 
of a given deal. 


Netting. A further issue that arises in practice is netting. Under a legally enforceable 
netting agreement the market value of all CDS transactions between B and S at Tı 
is computed and only the aggregate value is subject to bankruptcy procedures. In 
particular, perfectly offsetting transactions cancel each other out. Netting can reduce 
counterparty risk substantially, so netting agreements are widely used in practice. On 
the other hand, netting substantially increases the computational complexity of CVA 
and DVA computations, as we now explain. Suppose that there are N transactions 
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between B and S that fall under a netting agreement and let these be indexed by 
n € {1,..., N}. Denote by (V; n) the market value from the point of view of B of the 
nth transaction. A similar argument to the one used in the proof of Proposition 17.1 


s) 


and a similar formula applies to the debt value adjustment. Hence in the presence of 


implies that 


N + 
CVA; = £2(Ie,<rhe=9 DC me( X Vin) 


n=1 


netting agreements the computation of value adjustments amounts to the pricing of 
an option on the sum of the market value of all transactions covered by the netting 
agreement. In the case of CDS contracts each would typically refer to a different 
reference entity, so we have to consider n + 2 different default times. This is in 
general a much more difficult problem than pricing the options individually. 


17.2.2 Collateralized Value Adjustments for a CDS 


In this section we introduce popular collateralization strategies and analyse qualita- 
tively the impact of collateralization on credit value adjustments for a CDS. To keep 
things simple we assume that the collateral is posted in the form of cash and earns 
the risk-free rate of interest. Many collateralization arrangements used in practice 
are of this form, but arrangements where other securities are used as collateral can 
also be found. 

Details of the collateralization arrangement for an OTC CDS transaction are fixed 
in the credit support annex of the transaction. Roughly speaking, the procedure works 
as follows. At tọ = 0 a collateral account is opened. Let C, denote the cash balance 
in the account at time t. Here C; > 0 means that S has posted the collateral and 
that B is the collateral taker, whereas C; < 0 means that B has posted the collateral 
and that S is the collateral taker. The collateral position is updated at discrete time 
points t},...,¢y < T. At tı the collateral taker pays interest on the collateral, and 
the cash balance C;, is adjusted in reaction to changes in the price of the underlying 
CDS over (fo, t1]. This procedure continues up to the maturity of the CDS or until 
the first default occurs. If T) > T orif Ti < T and &, = R, the collateral account is 
closed at the “natural end” of the contract, so C; = 0 fort > Ti A T. If there is an 
early default of B or S—that is, if T), < T and £; € {B, S}—the collateral is used 
to reduce the loss of the collateral taker and any remaining collateral is returned. 

An issue arising in this context is rehypothecation. The collateral taker typically 
has unrestricted access to the posted collateral; in particular, the funds can be used 
as collateral in other OTC derivative transactions. It is therefore possible to have a 
situation in which the collateral taker defaults and a part of the collateral that should 
be returned is missing. To keep things simple we ignore this issue and assume that, 
in the case of a default of the collateral taker, the collateral is always returned in full 
to the other party. We refer to Notes and Comments for references regarding credit 
value adjustments in the presence of rehypothecation. 
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Collateralization strategies. We describe the cash balance in the collateral account 
by a (G;)-adapted process (C;), the so-called collateralization strategy. For conve- 
nience we allow for strategies where the collateral account is changed continu- 
ously and not just at predetermined rebalancing dates. Recall that the counterparty- 
risk-free CDS price is denoted by (V;). In practice, most collateralization arrange- 
ments take the form of a threshold collateralization strategy. Formally, a thresh- 
old collateralization strategy with thresholds Mı, M2 > O, labelled (Ga 1M2) for 
O0<t<T, AT, is given by 


,Mı,M: = 
cp? = (Vit Moa V -Mly emr (17:13) 


Under this strategy collateral is posted if V,* (the exposure of B) exceeds the thresh- 
old Mı orif V; (the exposure of S) exceeds the threshold M2. A threshold strategy is 
used if B and S want to protect themselves against severe counterparty-risk-related 
losses, while accepting the possibility of smaller losses in order to simplify the prac- 
tical management of the collateralization process. For Mı = M2 = 0 we obtain the 
special case of market-value collateralization with 


CS E yV, ORS TAT, (17.14) 


Collateralized value adjustment. Value adjustments for collateralized CDS con- 
tracts are largely analogous to the uncollateralized case, so we keep our presentation 
short. The bilateral collateralized value adjustment BCCVA is the difference between 
the collateralized credit value adjustment CCVA and the collateralized debt value 
adjustment CDVA. As before, the CCVA gives the value of the potential loss for B 
due to an early default of S, whereas the CDVA gives the value of the potential loss 
for S due to an early default of B. 

In order to describe these potential losses we have to consider the payments at an 
early default. Note that no additional collateral is posted at or after the default of B 
or S. The amount of collateral available for the settlement of the contract is therefore 
given by C7,— (the amount of collateral that has been posted immediately prior to 
Tı). This distinction matters if the close-out amount (V;) jumps at 7), for instance 
due to contagion effects, or if there is some delay between the last adjustment of the 
collateral account and the settlement of the positions. We begin with the scenario 
where the protection seller defaults first. We have to distinguish the cases V7, > 0 
and Vr, < 0. 


e Suppose that Vr, > O and that the protection buyer is the collateral taker, 
that is C7, > 0. In that case the collateral is used to reduce the loss of the 
protection buyer and excess collateral is returned. If C7,— is smaller than V7, , 
the protection buyer incurs a counterparty-risk-related loss of size 5° (Vr, — 
Cri—); if Cr,- 2 Vr, the amount of collateral is sufficient to protect B from 
losses due to counterparty risk. If S is the collateral taker, i.e. if C7, < 0, 
there is no available collateral to protect B and he suffers a loss of size 5° Vr, . 
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e Suppose that V7, < 0. In that case B has no exposure to S, so he does not 
suffer a loss related to counterparty risk (the fact that he has incurred a loss 
due to the decrease in the counterparty-risk-free CDS price is irrelevant for 
the computation of value adjustments for counterparty risk). 


Summarizing, the counterparty-risk-related loss of B is given by 
S 
lIr <r es VE — Ch. 


Similarly, S suffers a loss in the scenario where Vr, < 0 and where there is insuffi- 
cient collateral to settle the contract in full, that is, for Vz, > Cy, . The counterparty- 
risk-related loss of S is thus given by 


In <r) He,=8)8" (Vz, — Cn). 


Thus BCCVA,, the bilateral collateralized credit value adjustment at time t, is given 
by 
BCCVA,; = CCVA; — CDVA,;, (17.15) 


where 
CCVA, := Eln <r) lge=s DE, TOE VE — CH_)* |G), 
CDVA; := E(hir, <r) le=8) DE, T1)? (Vz, — CF)" | Gr). 


Without collateralization, i.e. for C; = 0, formula (17.15) reduces to the simpler 
result of Proposition 17.1. 


Performance of market-value collateralization. The sum CCVA; + CDVA,; gives 
the value in ¢ of the entire counterparty-risk-related loss, and it can therefore be 
viewed as a measure of the performance of a given collateralization strategy. Here 
we make the following immediate observation: suppose that market-value collater- 
alization with C™"ket — V, is used and that the market value of the CDS does not 
jump at 7), that is, Vr, = V7,— almost surely. In that case the formulas for CCVA, 
and CDVA, in (17.15) show that 


CCVA; = CDVA;=0, t<T1, 


so that market-value collateralization works perfectly. If, on the other hand, 
|AVr,| = |Vr, — Vr,—| is comparatively large, the performance of market-value 
collateralization will be not so good. Some numerical results supporting this obser- 
vation will be presented in Section 17.4.4. 


Notes and Comments 


The literature on counterparty risk management is growing rapidly, leading to a 
proliferation of valuation-adjustment acronyms (CVA, DVA, FVA and others). A 
detailed introduction can be found in the textbooks by Gregory (2012), Cesari et al. 
(2009) and Brigo, Morini and Pallavicini (2013). A non-technical introduction to 
the computation of value adjustments for counterparty risk is given in the papers by 
Hull and White (2012, 2013). 
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The derivation of the bilateral credit value adjustments in Propositions 17.1 and 
(17.15) is based on the papers by Brigo and Chourdakis (2009) and Brigo, Capponi 
and Pallavicini (2014) (see also Frey and Rosler 2014). The last two papers also 
consider the case of rehypothecation and discuss the actual computation of value 
adjustments in various portfolio credit risk models. Credit value adjustments in 
structural credit risk models are studied in Lipton and Sepp (2009). A very general 
technical analysis of value adjustments is given in Crépey (2012a,b). 


17.3 Conditionally Independent Default Times 


In this section we discuss models with conditionally independent default times. We 
begin with a discussion of general mathematical properties; applications and specific 
examples from the literature are considered in Sections 17.3.2 and 17.3.3. 


17.3.1 Definition and Mathematical Properties 


Throughout we consider a portfolio of m obligors with default times t; and default 
indicators Y; ; = l;t}, 1 S i < m, on a probability space (2, F, P). The ordered 
default times are denoted by 0 = Tọ < Tı <--- < Tm, and & € {1,...,m} gives 
the identity of the firm defaulting at time T„. We introduce the filtrations (HÌ), 
1 < i < m, and (#,) defined by 


Hi =a({¥si:s<t}) and H, = Hiv- v H". (17.16) 


( H) is the filtration generated by the default observation for obligor i alone; (#£;) is 
the filtration generated by default observations for all obligors. Often (#¢;) is called 
the default history of the portfolio or the internal filtration generated by the default 
times T],..., Tm. The definition of conditionally independent default times is a 
straightforward multivariate extension of the notion of doubly stochastic default 
times from Section 10.5.1. In particular, the distribution of the default times is 
affected by additional information on top of the default history (#;). Formally, 
we represent this information by a filtration (¥;) on the underlying probability 
space. Typically, (¥;) is generated by some observable background process. The 
information available to investors is given by the filtration (¢,) = (F;) V (Ft;) (see 
also (10.46)). 


Definition 17.2. The default times T1, . . . , Tm are conditionally independent doubly 
stochastic random times if there are positive, (F;)-adapted processes (yi), 1 <i < 
m, with Iņ; i = h Ys, i ds strictly increasing and finite for every t > 0, such that 


m fi 
P(t > ti, ..., Tm > tn | Fo) = | [exp (-f Ysi as). (17.17) 
i=1 0 


Note that the definition implies that each of the t; is a doubly stochastic random 
time with conditional hazard process (y; ;) in the sense of Definition 10.10 and that 
the rvs T],..., Tm are conditionally independent given Foo. 
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Construction and simulation via thresholds. Lemma 17.3 extends Lemma 10.11. 


Lemma 17.3. Let (%,1),...,(%,m) be positive, (F;)-adapted processes such 
that I; i := h Ys i ds is strictly increasing and finite for any t > 0. Let X = 
(X1,..., Xm)’ be a vector of independent, standard exponentially distributed rvs 
independent of Fæ. Define 


ti = D; (Xi) = inf{t > 0: Ti 2 Xi}. 


Then t1, ..., Tm are conditionally independent doubly stochastic random times with 
hazard processes (y, i), 1 <i < m. 


Proof. By the definition of t; we have t; > t <= > X; > Ii. The rvs Tg, are 
now measurable with respect to Foo, whereas the X; are mutually independent, 
independent of Fa and standard exponentially distributed. We therefore infer that 


P(t > t1,---,Tn > tn | Foo) =P Ai > T uim > am | Foo) 


m 
=| [PQ > Fi | Foo) 


m 
= [Je ™, (17.18) 


which shows that the qt; satisfy the conditions of Definition 17.2. 


Lemma 17.3 is the basis for the following simulation algorithm. 
Algorithm 17.4 (multivariate threshold simulation). 


(1) Generate trajectories for the hazard processes (yr, i) fori = 1,...,m. The 
same techniques as in the univariate case can be used here. 


(2) Generate a vector X of independent standard exponentially distributed rvs 
(the threshold vector) and set t; = PS (X:), l|<i<m. 


As in the univariate case (see Lemma 10.12), Lemma 17.3 has a converse. 


Lemma 17.5. Let t1, ..., Tm be conditionally independent doubly stochastic ran- 
dom times with (F,)-conditional hazard processes (y; i). Define a random vector X 
by setting Xi = T; (ti), 1 < i < m. Then X is a vector of independent, standard 
exponentially distributed rvs that is independent of Fa, and t = I,~ (X;) almost 
surely. 


Proof. For ti, ..., tm = O the conditional independence of the t; implies that 
m 
PITICI) < tis -++ Lin Gm) S tm | Foo) = | | POCED < ti | Foo). 
i=1 


Moreover, similar reasoning to the univariate case implies that 


P(T; (ti) S ti | Foo) = P (ti < DE t) | Foo) =1- e", 


which proves that X has the claimed properties. 
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Recursive default time simulation. We now describe a second recursive algorithm 
for simulating conditionally independent default times, which can be more efficient 
than multivariate threshold simulation. We need the following lemma, which gives 
properties of the first default time T4. 


Lemma 17.6. Let t1, ...,T, be conditionally independent doubly stochastic ran- 
dom times with hazard processes (y;,1), . . - , (Yt,m). Then T; is a doubly stochastic 
random time with (¥;)-conditional hazard process y; := eke ijinet 29. 


Proof. Using the conditional independence of the t; we infer that 
m t 
P(T, >t | Foo) = P(t) >t,..., Tm >t | Foo) =TTow(- f mids), 
i=1 o 


which is obviously equal to exp(— J Ys ds). As this expression is ¥;-measurable, 
the result follows. 


Next we compute the conditional probability of the event {£1 = i} given the 
time T; of the first default and full information about the background filtration. 


Proposition 17.7. Under the assumptions of Lemma 17.6 we have 
PE =i | Fo Vo(N) = yi(T)/y T), te {l,...,m}. 


Proof. Conditional on Fo the t; are independent with deterministic hazard func- 
tions y;(t), so it is sufficient to prove the proposition for independent random times 
with deterministic hazard functions. Fix some t > 0 and note that, since the ran- 
dom vector (T1, .. . , Tm) has a joint density, the probability of having more than one 
default in the interval (t — h, t] is of order o(h). Hence, 
P(E = i} N {T1 € @ —h, t]}) 
= P({t € (t — h, t]} A {t; > t forall j Æ i}) + o(h) 
= P(t; € (t — h, t]) gi P(t; > t) + 0(h) 
J#i 

by the independence of the t;. Since P(t; > t) = exp(— h yi(s)ds), 1 <i < m, 
the above expression equates to 


t—h t 
exp (- Í vi(s) as) (: — exp (- f vi (s) as)) 
0 t—h 


t 
x | [exp (-f yj(s) as) +o(h). 
0 


J#i 


Hence, 


1 t 
lim = P(E =i} N {T ea-rm = nex (- f 71s)4s). 
h>0+h 0 
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Moreover, by Lemma 17.6, 


t 


1 
jim ZPC € (ht) = PO exp (-f zoas), 


so the claim follows from the definition of elementary conditional probability. 


Algorithm 17.8 (recursive default time simulation). This algorithm simulates a 
realization of the sequence (T), En) up to some maturity date T. In order to formu- 
late the algorithm we introduce the notation A, for the set of non-defaulted firms 
immediately after 7,,. Formally, we set 


Ao := {1,...,m}, An:={1 <i <m: Y,(T,) =0}, nèl. (17.19) 
Moreover we define, for n > 0, 
i” = D Yis O<K<n<m. 
i€Ay 
The algorithm proceeds in the following steps. 


(1) Generate trajectories of the hazard processes (j;,;) up to time T and setn = 0, 
To = O and &) = 0. 


(2) Generate the waiting time 7,4, — T, by standard univariate threshold simula- 
tion using Algorithm 10.13. For this we use a generalization of Lemma 17.6 
that states that for conditionally independent defaults and n < m we have 


T +t 
P(Tn41 — Tn > t | (Ti, £1), +--+) (Tn, En), Foo) = exp (-/ p” as). 


n 


(3) If 7,4; È T, stop. Otherwise use Proposition 17.7 and determine „+1 as a 
realization of a multinomial rv € with 


Yi (Tn+1) 


P(é =i)= ; 
= ata) 


An. 


(4) Ifn + 1 = m, stop. Otherwise increase n by | and return to step (2). 


Recursive default time simulation is particularly efficient if we only need to sim- 
ulate defaults occurring before some maturity date T and if defaults are rare. In that 
case, Tp > T for n relatively small, so only a few ordered default times need to 
be simulated. With multivariate threshold simulation, on the other hand, we need to 
simulate the default times of all firms in the portfolio. 


Default intensities. We begin with a general definition of default intensities in 
reduced-form portfolio credit risk models. 


Definition 17.9 (default intensity). Consider a generic filtration (G,) such that the 
default indicator process (Y;,;) is (¢;)-adapted. Then a non-negative (4; )-adapted 
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right-continuous process (A;,;) with Jo As,i ds < œ almost surely for all t is called 
the (G,) default intensity of firm i if the process 


tat t 
Mri := Yii -f Às i ds = Y; i -f (1 — Ys i)às,i ds 
0 0 


is a (G+ )-martingale. 


The filtration (G;) is an integral part of the definition of (A;,;). If the choice of the 
underlying filtration is clear from the context, we will simply speak of the default 
intensity of firm 7. It is well known from stochastic calculus that the compensator 
of (Y; i) (the continuous, (%;)-adapted process (A;,;) such that Y;; — As; is a 
(Gr )-martingale) is unique. Since we assumed that (à+) is right continuous, it follows 
that the default intensity À; ; is uniquely defined for t < tį. 

In the next result we link Definition 17.9 to the informal interpretation of default 
intensities as instantaneous default probabilities. 


Lemma 17.10. Let (à, i) be the ($+) default intensity of firm i and suppose that the 
process (àr, i) is bounded. We then have the equality 
. 1 
Legon Ati = Ion a 77G <tt+h | Gi). 


Proof. Since the firm under consideration is fixed we will simply write t, Y, and A; 
for the default time, the default indicator and the default intensity. By the definition 
of Y; we have 


P(t St +h | Gi) = E Yin | Gr) = Yi + EY i4n — Yı | Ge). 
Since Y; -ha — Y;)às ds is a (ġr)-martingale, it follows that E(Yi+n — Yr | Gt) = 
POG — Y;)As ds | Gr), and therefore, since Iir>t} Y, = 0, 


t+h 
lr> P(T <tt+h| G1) = tron ( f (1 — Ys)às ds 
t 


s); (17.20) 


The right-continuity of the process ((1 — Y;)A;) implies that 


1 t+h 
lim =f (= Y,)àsds = (1 — Y,)à; a.s. 
h>0t h J; 


Recalling that (A;) is bounded by assumption, we can use (17.20) and the bounded 
convergence theorem for conditional expectations to deduce that 


. 1 
lir>t) im: EU St+h |G) = It>} (l — Yi) = lr>tàr, 


which proves the claim. 


Remark 17.11. The converse statement is also true; if the derivative 
1 
lim —P(t <t+h 
[my T <tt+h| Gi) 


exists almost surely, then t admits an intensity (under some technical conditions); 
references are given in Notes and Comments. 
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Finally, we return to the special case of models with conditionally independent 
defaults as introduced in Definition 17.2. The following proposition shows that in 
this model class a default intensity of any given firm i is given by the conditional 
hazard process of the default time 7;. 


Proposition 17.12. Let t,, ..., Tm be conditionally independent doubly stochastic 
random times with (¥;)-conditional hazard processes (4,1), .--, (Yt,m). The pro- 
cesses 


t^Ti 
Mii = Yri -f Ys,i ds 
0 
are then (G;)-martingales. 


Proof. The result follows immediately from Proposition 10.14 (which gives the 
compensator of a doubly stochastic default time in the single-firm case) if we can 
show that q; is a doubly stochastic default time with (g,')-conditional hazard process 
(%,i), where (Gri) is the artificial background filtration defined by 


Gi = Fv Hv HT! v HIH v... vy H”, t>0. (17.21) 


According to Definition 10.10 we have to show that 


f 
P(t >t | $3) = exp (-f Ysi as), t>0. (17.22) 
0 


This relationship is quite intuitive; since the qt; are independent given Foo, the default 
history of obligor j A i that is contained in gz but not in Fo has no impact on the 
conditional default probability of obligor i. 

A formal argument is as follows. Using Lemma 17.5, we may assume that there 
is a vector X of independent, standard exponential rvs, independent of Foo, such 
that for all 1 < j < m we have t; = r E (Xj). Obviously, t; is independent of X j 
for j Æ i, so 

t 
P(t >t | Fo Va({X;: j Fi}) = PG >t | Foo) =exp(- f Ys ids ). 
(17.23) 
On the other hand, if we know X; and the trajectory (yr, j)o<t<oo, We can determine 
the trajectory (Y;, j)o<t<o, SO gi C Foo Va({X;: j FA i}) (in fact, the two o- 
algebras are equal). Since the right-hand side of (17.23) is measurable with respect 
to gz, equation (17.22) follows from (17.23). 


Remark 17.13 (redundant default information). Suppose that T1, ..., Tm are 
conditionally independent doubly stochastic random times. Consider a financial 
product with maturity T whose discounted pay-off H depends only on the evolution 
of default-free security prices (which we assume to be (F,)-adapted) and on the 
default history of a subset A = {i,,...,i,} of the firms in the portfolio and is 
therefore measurable with respect to the o-algebra 


GA = Fr V HË v v HEC Gr. 
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A typical example would be the default and the premium payment leg of a single- 
name CDS on a given firm i. In this context an argument similar to the one used to 
establish (17.22) above shows that 


E(H | $) = E(H | $A); 


in particular, the default history of the firms that do not belong to A is redundant for 
computing the price of H. Such a relationship does not hold for more general port- 
folio credit risk models where the default times are not conditionally independent. 


17.3.2 Examples and Applications 


In many models from the financial literature with conditionally independent defaults, 
the hazard rates are modelled as linear combinations of independent affine diffusions, 
possibly with jumps. A typical model takes the form 


Yri = vio + 5 mb +y, 1<i<m, (17.24) 
j=l 
where Ca syst ì), 1 < j < p, and (wit), 1 <i < m, are independent CIR square-root 


diffusions. 3 in (10. 68) or, more generally, Race affine jump diffusions as in (10.75); 
the factor weights yij, O < j < p, are non-negative constants. 

Writing yo st = (WV ee u wy, it is obvious that (Ww, ) represents com- 
mon or systematic factors, While the (wi) processes represent idiosyncratic factors 
affecting only the hazard rate of obligor i. Note that the weight attached to the 
idiosyncratic factor can be incorporated into the parameters of the dynamics of 
(wis) and does not need to feature explicitly. Throughout this section we assume 
that the background filtration is generated by we) and (wis), l1<i<am.In 
practical applications of the model, the current value of these processes a derived 
from observed prices of defaultable bonds. 

We now consider some examples proposed in the literature. Duffee (1999) has 
estimated a model of the form (17.24) with p = 2; in his model all factor pro- 
cesses are assumed to follow CIR square-root diffusions, so that their dynamics are 
characterized by the parameter triplet (x, 6, o) (see equation (10.68)). In Duffee’s 
model, (we) represents factors driving the default-free short rate; the parameters 
of these processes are estimated from treasury data. The factor weights y;; and the 
parameters of (wid), on the other hand, are estimated from corporate bond prices. 

In their influential case study on CDO pricing, Duffie and Garleanu (2001) use 
basic affine jump diffusion processes of the form (10.75) to model the factors driving 
the hazard rates. Jumps in (yp) represent shocks that increase the default probability 
of a firm. They coñsider ı a S model with one systematic factor where 
Yii = =y" + yis, 1 <i < m, and they assume that the speed of mean reversion 
k, the volatility o "and A mean jump size u are identical for (wey) and (wid). It is 
straightforward to show that this implies that the sum );,; = ys + wid also follows 
a basic affine jump diffusion with parameters x, 0°YSt + aid, o, VOSA + (0d 
and u; the parameters of (y;,;) used in Duffie and Gârleanu (2001) can be found in 
the row labelled “base case” in Table 17.1. 
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Table 17.1. Parameter sets for the Duffie-Garleanu model used in Figure 17.1. 


Parameter set K 6 o 10 


n 
Pure diffusion 0.6 0.0505 0.141 0 0 
Base case 0.6 0.02 0.141 0.2 0.1 


High jump intensity 0.6 0.0018 0.141 032 0.1 


Pricing single-name credit products. As discussed in Remark 17.13, in the frame- 
work of conditionally independent defaults the pricing formulas obtained in a 
single-firm model remain valid in the portfolio context. Moreover, when the haz- 
ard processes are specified as in (17.24), most computations can be reduced to 
one-dimensional problems involving affine processes, to which the results of Sec- 
tion 10.6 apply. As a simple example we consider the computation of the conditional 
survival probability of obligor i. By Remark 17.13 and Theorem 10.19 it follows 
that 


T 
Pj >T |G) =P >T19)) = toon (exp (-/ ai à) | g! 
t 


For hazard processes of the form (17.24) this equals 


T 
Hazro ™0T = g (oxp (-f ws as) | 7) 
t 
i syst 
a | r (exp (-/ TAA as) | Fi). (17.25) 
E t 


j=l 


Each of the conditional expectations in (17.25) can now be computed using the 
results for one-dimensional affine models from Section 10.6. More general models, 
where hazard rates are given by a general multivariate affine process (and not simply 
by a linear combination of independent one-dimensional affine processes) can be 
dealt with using the general affine-model technology developed by Duffie, Pan and 
Singleton (2000). 


The implied one-period model and computation of the loss distribution. The con- 
ditional independence assumption and the factor structure (17.24) of the hazard 
processes have interesting implications for the distribution of the default indicators 
at a fixed time point. 

For simplicity we suppose that there are no idiosyncratic factors CAD) in the 
model. We fix T > 0 and consider the random vector Yr = (Yr,1,..., Yr,m)’. For 
y € {0, 1}” we can compute that 


P(Yr = y) = E(P (Yr = y | Foo)) 


= e( I] P(t; ST | Foo) I] P(g > T | Fo). 


j: yj=l js yj=0 
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From (17.24) we also know that 
P(t) <T | Foo) =1 - exp(- vioT — Enf pe as). (17.26) 


This argument shows that Yr follows a Bernoulli mixture model with p-factor 
structure as in Definition 11.5. The factor vector is given by 


i syst T syst : 
v= ([ VAN R] We as) 


and the conditional default probabilities p;(W) are given in (17.26). 

For practical purposes, such as CDO pricing, we need to be able to evaluate the 
distribution of the portfolio loss Lr = aon ôiYr,i in this model. As explained 
in Section 11.2.1 it is useful to be able to compute the Laplace-—Stieltjes transform 
F Lr(A) = E(e~*£7) since the distribution of Ly can then be determined using 
inversion techniques for transforms. 

In Section 11.2.5 we noted that it is quite common in practice to approximate 
Bernoulli mixture models with Poisson mixture models, which are often more 
tractable. We will derive the Laplace-Stieltjes transform for an approximating Pois- 
son mixture model and see that it consists of terms that can computed using results 
for affine processes in Section 10.6.3. More precisely, we replace Lr by the rv 
Lr := = D 1 ôi Yr, i, and we assume that, conditional on Fo, the rvs Yr, an -> Erm 
are independent, Poisson-distributed rvs with parameters 


P T 
Iri=yoT +). ri | weds, 1<i<m. (17.27) 
Jek One 
We also assume that the losses given default 5, . . . , ôm are deterministic and we use 


the fact that the Laplace—Stieltjes transform of a generic Poisson rv N ~ Poi(J~) is 
given by Fy (A) = E(e*%) = exp(I (e7 — 1)). Using the conditional indepen- 
dence of Yr,1, ..., YT,m it follows that 


m m 
EM? | Foo) = | [ECM | Foo) = | [expr ie — 1) 
i=l i=l 


m 
= ap( Ze = Drra). 
i=l 


Using the definition of Ir ; in (17.27) and the independence of the systematic risk- 
factor processes (wey), ae (we), we obtain 


Fy.) = E(E(e™"? | Feo) 


= exp | Pe- DoT] 


i=l 


P m T 
x [e(a | (Soe - xs) f vas}. (17.28) 
j i=l 
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Expression (17.28) can be computed using the results for one-dimensional affine 
models in Section 10.6.3. For further refinements and references to Laplace inversion 
methods we refer to Di Graziano and Rogers (2009). 


Default correlation. As we have seen in Chapter 10, default correlations (defined 
as the correlations p(Y7,;, Yr,j), i # j, between default indicators) are closely 
related to the heaviness of the tail of the credit loss distribution. In computing 
default correlations in models with conditionally independent defaults it is more 
convenient to work with the survival indicator 1 — Yri. By the definition of linear 
correlation we have 


PYT, i, Yr,j) = p(l — Yri, 1— Yr,j) 
7 P(t > T,t; > T)— Fi(T)Fj(T) 
(F(T) — Fi (T)))/2 (F(T) — Fj (T))) 1/2 


(17.29) 


For models with hazard rates as in (17.24), the computation of the survival probabil- 
ities F;(T) has been discussed above. For the joint survival probability we obtain, 
using conditional independence, 


Pj >T,tj > T)=E(P@ > T,tj >T | Foo)) 
= E(P(j >T | Foo) P(tj >T | Foo)) 


T 
= (ex (-[ (Ysi + Ys,j) as) ). (17.30) 
0 


For hazard rates of the form (17.24), expression (17.30) can be decomposed—using 
a similar approach to the decomposition in (17.25)—into terms that can be evaluated 
using our results on one-dimensional affine models. 

Itis often claimed that the default correlation values that can be attained in models 
with conditionally independent defaults are too low compared with empirical default 
correlations (see, for example, Hull and White 2001; Schönbucher and Schubert 
2001). Since default correlations do have a significant impact on the loss distribution 
generated by a model, we discuss this issue further. As a concrete example we use the 
Duffie—Garleanu model and assume that there are no idiosyncratic factor processes 
(wid) and that all risk is systematic. As discussed above, in that case the default 
indicator vector Yr follows an exchangeable Bernoulli mixture model with mixing 
variable O given by 1 — exp(— a yey ds). 

We have seen in Section 11.2.2 that in exchangeable Bernoulli mixture mod- 
els every default correlation pọ € (0, 1) can be obtained by choosing the variance 
of the mixing variable sufficiently high. It follows that in the Duffie—Garleanu 
model high levels of default correlation can be obtained if the variance of the rv 
Ir := J yp syst ds is sufficiently high. A high variance of Ir can be obtained by 
choosing a high value for the volatility o of the diffusion part of we) or by 
choosing a high value for the mean of the jump-size distribution u or for the jump 
intensity /°. A high value for o translates into very volatile day-to-day fluctuations 
of credit spreads, which might contradict the behaviour of real bond-price data. This 
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Figure 17.1. Both plots relate to the Duffie-Gârleanu model with (wid) = 0 and different 
parameter sets for (wey ), which are given in Table 17.1. (a) The default correlations for 
varying time to maturity. We see that by increasing the intensity of jumps in (Z+), the default 
correlation is increased substantially. Part (b) shows that the survival probabilities for the 
three parameter sets are essentially equal, so that the differences in default correlations are 


solely due to the impact of the dynamics of ( oe ) on the dependence structure of the default 
times. 


shows that it might be difficult to generate very high levels of default correlation in 
models where hazard rates follow pure diffusion processes. 

Instead, in the Duffie-Garleanu model we can increase the frequency or size 
of the jumps in the hazard process by increasing /° or u. This is a very effective 
mechanism for generating default correlation, as is shown in Figure 17.1. In fact, this 
additional flexibility in modelling default correlations is an important motivation for 
considering affine jump diffusions instead of the simpler CIR diffusion models. 

These qualitative findings obviously carry over to other models with conditionally 
independent defaults. Summing up, we conclude that it is certainly possible to 
generate high levels of default correlation in models with conditionally independent 
defaults; however, the required models for the hazard processes can become quite 
complex. 


17.3.3 Credit Value Adjustments 


Finally, we turn to the computation of uncollateralized credit value adjustments 
for credit default swaps in models with conditionally independent default times. 
We first recall the notation of Section 17.2.1: ts, tg and tp are the default times 
of the protection seller, the protection buyer and the reference entity in a given 
CDS contract; ts, Tg and Tr are conditionally independent under the risk-neutral 
pricing measure Q with (F;)-conditional hazard processes y$, y? and y*; the short 
rate is given by some (F;)-adapted process (r+); recovery rates are deterministic; 
Tı := Ts A Tp A Tr is the time of the first default; D(O, t) = exp(— is r, du). 

As in Section 17.2.1 we denote by V; = E2 (II (t, T) | Q1) the net present value 
of the promised cash flows of the protection-buyer position in the CDS. It was shown 
in Section 10.5.3 that V; = Mirg>t} V, where the (¥;)-adapted process V, is given 
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by 


T Ss 
= szef f yè exp (-f (ru + yÈ) au) ds 7) 
t t 


= aaa 5 (ty = tn—1) exp (- for F yE) au) | 5), (17.31) 
{ t 


tn >t} 


and the time points t, represent the premium payment dates. 
A general formula. For simplicity we consider the case t = 0. According to Propo- 
sition 17.1, the bilateral uncollateralized value adjustment (for the protection buyer) 
is given by 
BCVA = CVA — DVA 
= E? (lir, <r} e=5) DO, T1)6° (Vr,)*) 
— EÈ (Ur, <ryle=2) DO, T1)8? (Vr,)). (17.32) 


Here we have used the fact that V7, = Vr, on {é = S} or {£ = B}. We concentrate 


on the CVA term in (17.32). Recall from Proposition 17.7 that 
Yr, 

QE = S | o(Ti) V Foo) = i. 
YT, + YTI + YTI 


By double conditioning we therefore obtain that 


CVA = EL (E? (hr, <r} lg-s} DO, T1)55(Vz,)* | oTi) V Foo)) 
= E? (lir,<r} D0, T1)8 (Vr) t OE = S | o (T1) V Foo)) 


s 
~ y: 
= Ee( 1n- DO, mOn =i), 
YT, + YT, + YT, 


In the terminology of Section 10.5.2 this is a typical payment-at-default claim with 
payment at the stopping time Tı. By Lemma 17.6, Tı is a doubly stochastic random 
time with hazard rate y5 + ye + yE . Applying the third identity in Theorem 10.19 
gives 


T t 
CVA = £°( I. y58°(V;)* exp (- T Cs HyS tye +78) as) ar), 
0 0 
(17.33) 
A similar computation for the DVA term in (17.32) yields 


T t 
DVA = zef f yP53(V,)~ exp (-f (rs + A + yë + yi as) ar), 
0 0 
(17.34) 
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More explicit results. In order to evaluate (17.33) and (17.34) we need to make 
more specific assumptions about the form of the hazard rates. Here we derive a PDE 
for the value adjustments in a one-factor CIR model. More precisely, we assume 
that y°, y, y? and the short rate r are affine functions of a one-dimensional CIR 
process W with parameters «, 0 and o. Therefore, 


yE = y(t, W) =a" W, +b? (17.35) 


fora? > 0,b® > 0,and similarly for y B. y S andr. In that case, the pre-default value 
V, of the CDS is given by a function v(t, W) that can be easily computed using the 
affine-model techniques introduced in Section 10.6. Moreover, the Feynman—Kac 
formula (see Lemma 10.21) gives 


T T 
£2( / yso V exp (- / (ve + vb + rf + ru) au) | F.) = h5 (t, W), 
t t 
where the function AS solves the PDE 
he + (C —OW)hy, + $o’ Why + SOT = ry? ty? Hyh. (17.36) 


Here the arguments (t, Y) have been omitted for ease of notation. The corresponding 
PDE for the buyer adjustment is 


hf + (kK —Ow)hy + 30° Why y + 28TH +y? ty? ty Dh, 
(17.37) 


again with terminal value h? (T, Y) = 0. Both PDEs can be solved numerically. 
Summarizing we obtain, in the special case of the one-factor CIR model, that 


BCVA = h5 (t, %) — h? (t, W). (17.38) 


Notes and Comments 


For an alternative textbook-level treatment of models with conditionally indepen- 
dent defaults, see, for example, Chapter 9 of Bielecki and Rutkowski (2002). The 
simulation of conditionally independent default times is discussed in Duffie and 
Singleton (2003). The existence of default intensities is discussed in Aven (1985) 
and in Section 2 of Janson, M’ Baye and Protter (2011). Both sources contain results 
that can be viewed as partial converses to Lemma 17.10. A model with condition- 
ally independent defaults where the factor W follows a finite-state Markov chain is 
considered by Di Graziano and Rogers (2009). 

Further empirical work on affine models for credit portfolios includes that of 
Duffee (1999) and Driessen (2005). We refer the reader to Chapter 10 of Filipović 
(2009) for further information on affine processes. The empirical estimation of mod- 
els with conditionally independent defaults and various related issues are discussed 
in Duffie (2011). 
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17.4 Credit Risk Models with Incomplete Information 


In this section we are concerned with reduced-form portfolio credit risk models 
under incomplete information. We consider models where the default times are 
independent given the realization of some common factor; in principle, this factor 
could be a stochastic process (Y%), as in the models with conditionally independent 
defaults considered in Section 17.3, but for expository purposes we concentrate on 
the case where the factor is simply a one-dimensional random variable V. We assume 
that V is not directly observable by investors. Rather, their information set consists 
of the default history and—in the more advanced versions of the model—additional 
noisy observations of the factor represented by an auxiliary o-algebra. 

It will turn out that portfolio credit risk models with incomplete information have a 
number of attractive features. To begin with, they are able to generate rich dynamics 
of credit spreads incorporating both default contagion and credit-spread volatility 
effects. Moreover, the pricing of typical credit derivatives can be carried out using 
a natural and fairly efficient two-step procedure. 

Our presentation starts with a general discussion of credit risk and incomplete 
information in Section 17.4.1. In Section 17.4.2 we consider simple models in which 
investors observe only the default history; the extension to a richer information set 
and applications to counterparty risk management are discussed in Sections 17.4.3 
and 17.4.4. 


17.4.1 Credit Risk and Incomplete Information 


Throughout this section we use the following set-up. We work on a filtered proba- 
bility space (2, F, ($1), Q), where Q is the risk-neutral measure used for pricing 
derivatives and (91) is the global filtration, so all processes introduced will be (ĝ:)- 


adapted. We consider a portfolio of m obligors with default times t1, ..., Tm and 
default indicator processes (Y;,;) given by Y; i = Itr; 1 < i < m. We assume 
that there is a random variable V on (2, F, Q) such that the rvs T1,..., Tm are 


conditionally independent given V, with conditional survival functions of the form 


t 
F,,\v(t | v) = exp (-f yi (v, s) as) (17.39) 


for continuous functions y; : R x [0, 00) —> (0, co), such that h yi(v, s)ds < co 
for all ż and v. Note that this implies that y; (v, t) is the conditional hazard function, 
and the conditional density fr;ıv (t | v) satisfies 


t 
fuv (t | v) = yi (v, t) exp (-f yi (v, s) as). (17.40) 
We consider two cases for the distribution of the factor V. In the first case V is 
an absolutely continuous rv with density gy (v), as in the factor copula models 
considered in Section 12.2.2. In the second case V is a discrete random variable 
with values in the set SY := {v1,..., Ux} and probability mass function mg = 
Q(V = v), 1 < k < K, as in the class of implied copula models considered in 
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Section 12.3.3. Finally, we always take the loss given default 5; of the firms in the 
portfolio and the default-free interest rate r(t) to be deterministic. 

The full-information filtration is given by Gi = H,Vo0(V)V o(Bs: s St), 
t > 0. Here, (#,) is the default history of the portfolio and (B;) is a Brownian 
motion independent of V and of the t;. (The process (B+) is used in Section 17.4.3 
as a building block in modelling the noisy information available to investors.) If 
considered under the full-information filtration (G1), the model is therefore a model 
with conditionally independent defaults as discussed in Section 17.3. However, as 
mentioned before, the rv V is not observable by investors. We therefore describe 
the information available to investors by a filtration (¢;) with $, C Gi, t > 0, and 
refer to (G,) as the investor filtration. We assume throughout that investors are able 
to observe the default history of the portfolio, i.e. #; C G: for all t > 0. On the 
other hand, V is not ,; measurable for all t > 0. 


Pricing of credit derivatives. Consider a single-name CDS, a CDS index or a CDO 
tranche on the firms in the portfolio (or a subgroup thereof), and denote by T (t, T) 
the corresponding stream of discounted future cash flows. In the setting of this sec- 
tion, the price of this cash-flow stream at time t is given by V; = E 27 (t, T) | Gr). 
Note that V; is given by a conditional expectation with respect to G;, the information 
actually available to investors at time t. 

The price V; can be computed in two steps. By iterated conditional expectations 
we obtain that 

V, = EL (ES (T(t, T) | Jo) | $e). (17.41) 

In the first step we consider the inner conditional expectation. In order to evaluate 
this expectation we need to determine the ĝ -conditional distribution of the non- 
defaulted firms. Denote by {i1,..., ie} C {1,..., m} the identity of these firms. 
Since the Brownian motion (B;) is independent of V and the default times t;, we 
find, for arbitrary t1, ..., te > t, that 


Oi, >t,- Tie > te |G) = OGy >t,- Tiy > te | H V o(V)). 


Since the t; are independent given V, it follows that 


£ 
Q(T > ty. Ty > te | H VO(V)) = | | OG, > t | FG Vo) 
i=l 


“Tr Olta > 1 | o(V)) 
Ee OG sta) 


£ t 
= I] Iq, >1} exp (-f Vi, (V, s) as). 
l=1 i 


The firms in the portfolio are therefore conditionally independent given Gis with 
conditional survival probabilities 


S 


Olti > s | $1) = lu>} exp (- / vi(V, sas), s>t (17.42) 
t 
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Asin Section 12.3.1 it follows that the value E£ (IT (t, T) | G) (the price with respect 
to the larger o -algebra G t) depends only on the time f, the realized value of the factor 
V and the realized value of the default indicator vector Y, = (Y;.1,..., Yim)’ at 
time t. In summary, this means that E2 (I(t, T) | G1) = h(t, V, Y;) for a suitable 
function h(-). For instance, in the case of a protection-buyer position in a single- 
name CDS on firm 7 with premium payment dates t),..., ty > t and spread x, we 
have that 


tn S 
ht, v, y) = (d -— wha f Yi (v, s) exp (-f r(u) + yi (v, au) ds 
t t 


N re 
=x > (ty — tn) exp (-f r(s) + viv, 9as)). 
n=1 


For CDO tranches, the function h(-) can be computed using the analytics for factor 
credit risk models developed in Section 12.3.1. 
Using (17.41) we infer that the price of the cash-flow stream at time ¢ is given by 


V, = EL (h(t, V, Y;) |G.) = fre v, Y;)gvig,(v) dv, (17.43) 


where gyjg, (v) represents the conditional density of V given ¢,. If V is discrete, 
the last integral is replaced by the sum 


K 

So h(t, vk, YOV = ve | Go). 

k=1 
The pricing formulas therefore have a similar structure to those in the static factor 
models considered in Section 12.3, but the unconditional density gy(v) and the 
unconditional probability mass function x have to be replaced by the conditional 
density gy9, (v) and the conditional probabilities mE = O(V = vk | 9), 1 <k <S 
K. The computation of these quantities is discussed below. 


Default intensities. A similar two-step procedure applies to the computation of 
default intensities. Since defaults are conditionally independent given V, we can 
apply Proposition 17.12 to infer that the default intensity of firm i with respect to 
the global filtration (Gx) satisfies ie, i = vi(V, t). In order to compute the intensity 
of t; with respect to the investor filtration (¢;), we use the following general result. 


Lemma 17.14. Consider two filtrations (Qr) and (4) such that Q, C G, forallt > 0 
and some random time t such that the corresponding indicator process Y; = Ij;<1} 
is (G,)-adapted. Suppose that t admits the (G;)-intensity i. Then the (G1) default 
intensity is given by the right-continuous version of the process 4; = E (Ar | Gt). 


Lemma 17.14 is a special case of Theorem 14 in Chapter 2 of Brémaud (1981) 
and we refer to that source for a proof. Applying Lemma 17.14 we conclude that 
(A1,i), the (G,) default intensity of firm i, is given by 


Ati = EVV, t) | Gr) = [vo 1) 8v|g,(v) du; (17.44) 
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in the case where V is discrete, the last integral is replaced by the sum 


K 
Yo vive, NOUV = vr | Gx). 


k=1 
17.4.2 Pure Default Information 


In this section we consider the case where the only information available to investors 
is the default history of the portfolio. This is modelled by setting (%1) = (At). 


Recursive computation of gy|3¢,._ We concentrate on the case where V has a density; 
the discrete case can be handled by analogous arguments. We will compute the 
conditional density gyz, on the sets {t < Tı} (before the first default) and {7| < 
t < To} (after the first default and before the second). 

Let A C R be a measurable set. We first note that on the set {t < Tı} we have 
QVEA|H)=OQOVEA|T >t). Now 

QV € A}N{T >t) 
QVEA|T > Hh= On > D (17.45) 

and, using the conditional independence of defaults given V, the numerator may be 
calculated to be 


QV € AJN{T > t} = f om >t|V=v)gy(v) dv 


= f | [ ec >t|V=v)gy(v)du 
Ain 


tm 
= f ex (-f Dno, s)ds Jev dv 
A 0 ia 


It follows from (17.45) that on {t < Tı} the conditional df Gy, x, (v) is given by 


tm 


ia aaa a, Lemon evo 
viv | D= Om =n Dacor Th 2 yi(w, s)ds |gy (w) dw, 


and hence the conditional density gy, (v) satisfies 


tm 
aviae (V) x exp (- i Yn(v.s)ds) evo, 
i=1 


where œ stands for “is proportional to”. The constant of proportionality is of course 
given by (Q(T, > t))~!, but this quantity is irrelevant for our subsequent analysis. 
To simplify notation we define the quantity 


y(v, 5) = xe = Ys i)yi@, s), (17.46) 
i=1 


and with this notation we have that on {t < Tı} 


t 
gV |x, (V) X exp (-f y(v,s) ds Jev% (17.47) 
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Next we calculate gyz, (v) on the set {Ti < t < 7}. As an intermediate step 
we derive gyjr,(v | t), the conditional density of V given the default time t; of an 
arbitrary firm j in the portfolio. Applying the conditional density formula (see, for 
instance, (6.2)) to the conditional density frv Ct | v) given in (17.40), we obtain 


t 
Bvirj(U | t) x frv | v)gy (v) = yj (v, t) exp (-f yj(v, s)ds }gy(v). 
0 
(17.48) 
The available information at t consists of the rvs 7; and & (the default time and 
identity é; of the firm defaulting first) and the event B; := {t; > t, i Æ &1} (the 


knowledge that the other firms have not yet defaulted at t). We can therefore write 
QVEA|#H)=OQOV €A | B, Ti, 1) on {Ti < t < Th} and we note that 
QV € A}N B: | T1, &1) 


QOVEA|B,11,81) = OB, | T,, ED x QV € A}N B; | Ti, £1). 


Now we calculate that 


$ 
OUV € A} N B: | Ti, g1) = J) [] exp (- [ 710.9) 8) eV W | Ti) dv, 
i£ 


and we can use (17.48) to infer that this probability is proportional to 
ys y(v,s) as Jev% dv. 


Tı ™ t 
I Ve, (v, Tı) exp (-f Dros as) exp (-f 
i ° i=l T ize 


Using the notation (17.46) we find that on {T; < t < To} 
t 


gvz (V) X ye, (v, Tı) exp (-f y(v,s) as Jev. (17.49) 


If we compare (17.47) and (17.49), we can see the impact of the additional default 
information at T; on the conditional distribution of V. At T; the “a posteriori density” 
8V| Hr, (v) is proportional to the product of the hazard rate ys; (v, Tı) and the “a priori 
density” gy|¢,(v) just before Tı. In the following two examples this result will be 
used to derive explicit expressions for information-driven contagion effects. 

The above analysis is a relatively simple example of a stochastic filtering problem. 
In general, in a filtering problem, we consider a stochastic process (WY) (a random 
variable is a special case) and a filtration (G;) such that (WY) is not (G;)-adapted. We 
attempt to estimate the conditional distribution of Y% for t > 0 given the o-algebra 
Gr in a recursive way. 


Example 17.15 (Clayton copula model). For factor copula models with a Clayton 
survival copula, the conditional density gyz, (v) and the default intensities (A,,;) 
can be computed explicitly. The Clayton copula model with parameter 0 > 0 is 
an LT-Archimedean copula model (see Example 12.5) with V ~ Ga(1/0, 1). We 
denote the Laplace-—Stieltjes transform of the distribution of V and its functional 
inverse by Gy and Gy’, respectively, and we recall that the density g(v; a, B) of 
the Ga(a, 6) distribution satisfies g(v; a, B) « vle? and has mean a/B (see 
Appendix A.2.4). 
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As shown in Example 12.5, in threshold copula models with an LT-Archimedean 
copula the conditional survival function is given by 


Fav | v) = expl- vê (F: 0), 


where F; (t) denotes the marginal survival function. The conditional hazard function 
yi (v, t) is therefore given by 


d, - deny 
Vil, t) = — p P Batt | v) = -va Gy FO). (17.50) 


Since Q(T > t | V=v) = exp(—v) L] Gy! (Fi), we can use (17.47) to 
obtain 


m 
vise, (v) x v !/®-! exp (-»(1 +» GF). t<T. (17.51) 
i=l 
Hence, for t < Tı, the conditional distribution of V given #, is again a gamma 
distribution but now with parameters a = 1/0 and $ = 1 + yo 1 Gy! (Fin). 
Since the mean of this distribution is a/8, we see that the conditional mean of 
V given T; > ft is lower than the unconditional mean of V. This is in line with 
economic intuition; indeed, the fact that T) > t is “good news” for the portfolio. 
Next we turn to the updating at t = Tı. Using (17.49), (17.50) and (17.51) we 
find that 


BVI str, X es exp (-»(1 +5 Gy" Furi). (17.52) 


i=l 
so that given #f7,, V follows a gamma distribution with parameters a = 1 + 1/0 
and 6 = 1+}; Gy! F(T). At Tı the conditional mean uyg, of V jumps 
upwards. We have the formulas 


i 1/0 1+1/0 
IM UVH, = oa » BV sr = —— } 
oT, 1+ 7%) Gy (F; TD) "140%, p ET) 


Readers familiar with Bayesian statistics may note that the explicit form of the 
updating rules is due to the fact that the gamma family is a conjugate family for the 
exponential distribution. 

Finally, we consider the (#,) default intensity process (A;,;). Since y; (v, t) « v, 
we use (17.44) to infer that à; ; is proportional to E(V | #;), the conditional mean 
of V given #f;. In particular, A;,; jumps upwards at each successive default time 
and decreases gradually between defaults. This is illustrated in Figure 17.2, where 
we consider an explicit example in which, for all 7, the marginal survival functions 
are given by F;(t) = e~” for a fixed default rate y. The parameter 6 is chosen to 
obtain a desired level of default correlation. 


Example 17.16 (implied copula model). In this example we take a closer look at 
the case where V is discrete with state space S V = {uy,..., ux} and probability 
mass function x. In this way we obtain a dynamic version of the implied copula 
model from Section 12.3.3. 
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Figure 17.2. Paths of the default intensity (à+) in the Clayton copula model, assuming that 
the first default time T4 is four months. The parameters are as follows: portfolio size m = 100; 
marginal default rate y = 0.02; 6 is chosen to give one-year default correlations of 2% and 
0.5%. As we expect, a higher default correlation implies a stronger contagion effect. 


Using similar arguments to those used to derive (17.47) and (17.49) gives, for 
t < Tı, that 


exp(— fy 7 (ve, 5) ds) zt 


Q(V = Uk | Hi) = = š t < Ti; (17.53) 
Eii exp fo (i, 8) ds) 
at Tı we have 
Ti 
O(V = ve | Hr,) & yg (Ve, T1) exp (-f 7 (k, s)ds)m. (17.54) 
0 


Set ae == OV = vk | Hi), 1 < k < K. We now use relations (17.53) and 
(17.54) to derive a system of stochastic differential equations for the process m; = 
(k, sira ra Y’, which is driven by time and the default indicator process (Y;). This 
helps to understand the dynamics of the default intensities and of credit derivatives. 

First we consider the dynamics of the process (7+) between default times. Define 
Ek = exp(— i y (vg, 5) ds). With this abbreviation we get, for t < Tı, the rela- 
tionship mk = ENIE] 1 E!), and hence 


de DTP Oe DEE (Lra mE) + HEF Dies HPO DED 
dt Oh wey? 
Eki mywn tE! ) 
DA mE! 


K 
= at (Gace t) + Yo my (ur, D); L<k<K. (17.55) 
l=1 


= rý (Gace t) F 


Note that (17.55) is a system of K ordinary differential equations for (a), so the 
process (7) evolves in a deterministic manner up to time 7;. Next we consider the 
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case t = Tı. We will use the general notation a := lim,_,;- nE. From (17.54) we 
get that 


k 
FEEC (ve, TER m vae Tag- awa 
Ti = > : 

DE va, T) Ef, 71 Diet Ye (ur, Ti), 


where the second eau follows since Th = = mE% 7 / ©; m Et, |). Observe that 
at t = T; the process (xk ) has a jump of size 


a ( Ve, (Vk, Drk i) 
nae ; 
Eki va wn Dal 
Combining (17.55) and (17.56) we therefore obtain the following K -dimensional 
system of stochastic differential equations for the dynamics of (7+) for all t: 


drt = at (- ese. Peon) a 


l=1 


k 

i (Uk, t); 

o nt (= Vi (ve, OM 1) af, 1<k<K. (1757) 
wie 1 vin Orl 


Recall that àz i = E, mk yi (ug, t) is the (#,) default intensity of firm i. Define 
the compensated default indicator processes by 


Mii = Yri — fa- Ys—,i)Às—,i ds = Yri — fa- fd Soak. Vi (Ug, S) ds. 


k=1 
(17.58) 
Next we show that the system (17.57) can be simplified by using the processes 
(M;,1),..., (Mt,m) as drivers. This representation will be useful to see the link 


between our analysis and the more general filtering model considered in Sec- 
tion 17.4.3. To this end, note that the dt term in (17.57) is equal to 


Vi (UE, t) )( Ss )) 
1-Y,; 1 yiv, i 
+D ‘ | eee ae ae 


Using the compensated default indicator processes as drivers, the dynamics of (z;) 
can therefore be rewritten in the following simpler form: 


m 
; t 
drž =) a z vi (Vie t) - i) dM; j. (17.59) 
Xaj Vi (v, Jura 


Default intensities and contagion. Equation (17.59) determines the dynamics of 
the (#,) default intensities. As in the Clayton copula model, (A;,;) evolves deter- 
ministically between default events and jumps at the random default times in the 
portfolio. It is possible to give an explicit expression for the size of this jump and 
hence for the contagion effects induced by incomplete information, as we now show. 
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Consider two firms i # j. It follows from (17.59) that the jump in the default 
intensity of firm i at the default time t; of firm j is given by 


Vj (Uk, Tj) i) 


K l 
ei yj (uy, Tj), — 


K 
Atj,i — Ati = a Vi (Vk, oat, 
k=1 


_ cov" (yi, yj) 


; (17.60) 


Tj —,J 
Here, cov” denotes the covariance with respect to the probability measure x on SV, 
and y; is shorthand for the random variable v +> y;(v, tj). 

Formula (17.60) makes two very intuitive predictions about default contagion. 
First, all other things being equal, default contagion increases with increasing cor- 
relation of the random variables y; and y; under Trj—, i.e. contagion effects are 
strongest for obligors with similar characteristics. Second, contagion effects are 
inversely proportional to àz;—, j, the default intensity of the defaulting entity. In par- 
ticular, the default of an entity j with high credit quality and, hence, a low value of 
Àz;—, j has a comparatively large impact on the market, perhaps because the default 
comes as a surprise to market participants. 


17.4.3 Additional Information 


In the dynamic version of the implied copula model studied in Example 17.16, 
default intensities (and hence credit spreads) evolve deterministically between 
defaults. This unrealistic behaviour is due to the assumption that the investor fil- 
tration is the default history (#,). With this choice of filtration, significant new 
information enters the model only at default times. In this section we discuss an 
extension of the model with a richer investor filtration (G,) that is due to Frey and 
Schmidt (2012). 


The set-up. Asin Example 17.16 we assume that the factor is a discrete rv V taking 
values in the set SVY = {v,,..., ux}. The investor filtration (G,) is generated by the 
default history of the firms in the portfolio and—this is the new part—an additional 
process (Z;) representing observations of V with additive noise. Formally, we set 
(Gr) = (H) V (F2), where (Z+) is of the form 


t 
z=f a(V,s)ds+oB,. (17.61) 
0 


Here, (B;) is a Brownian motion, independent of V and the default indicator process 
(Y;), a(-) is a function from SY x [0, 00) to R, and ø is a scaling parameter that 
modulates the effect of the noise. 

Equation (17.61) is the standard way of incorporating noisy information about V 
into a continuous-time stochastic filtering model. To develop more intuition for the 
model we show how (17.61) arises as the limit of a simpler discrete-time model. 
Suppose that investors receive noisy information about V at discrete time points 
tk = kA, k =1,2,..., and that this information takes the form X, = a(V, tk) + €k 
for an iid sequence of noise variables €g, independent of V , with mean 0 and variance 
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o2 > 0. Now define the scaled cumulative observation process ZÂ := A wie Xn 
and let o = \/Ao2. For A small we have the approximation 


=) Aa, hy+ 0) eX [avsas ton, (17.62) 


th<t th<t 


Without loss of generality we set o = 1; other values of ø could be incorporated by 
rescaling the function a(-). In the numerical examples considered below we assume 
a(ug,t) = clny(vg, t), where y (vg, t) is as defined in (17.46) and the constant 
c > O models the information content of (Z+); for c = 0, (Z+) carries no information, 
whereas, for c large, the state of V can be observed with high precision. The process 
(Z+) is an abstract model-building device that represents information contained in 
security prices; it is not directly linked to observable economic quantities. We come 
back to this point when we discuss calibration strategies for the model. 


Dynamics of (T+). As before we use the notation zk = OV = vk | 9), 1 <S 

k < K, and m; = (x} fey Tt; Ky . A crucial part of the analysis of Frey and Schmidt 
(2012) is the derivation of a stochastic differential equation for the dynamics of 
the process (s;). We begin by introducing the processes that drive this equation. 
According to Lemma 17.14, the (9+) default intensity of firm i is given by A; = 
LE oe yj (vg, t), and the compensated default indicator processes are given by the 
martingales (M; ;) introduced in (17.58). Moreover, we define the process 


t K 
w=z- f X mfa(vx,s)ds, t>0. (17.63) 


k=1 
In intuitive terms we have the relationship 


K 
E(dZ; | Gr) = E (a (vg, t) dt | Gi) = X mfa(vg, t)dt. 
k=1 

Hence, dW; = dZ; — E (dZ; | $+) dt, so the increment dW; represents the unpre- 
dictable part of the new information dZ,. For this reason, (W;) is called the innova- 
tions process in the literature on stochastic filtering. It is well known that (W,) is a 
(Gr )-Brownian motion (for a formal proof see, for example, Bain and Crisan (2009) 
or Davis and Marcus (1981)). 

The following result from Frey and Schmidt (2012) generalizes equation (17.59). 


Proposition 17.17. The process (m) = (t, ates a y solves the SDE system 


i t 
dak = Yat (= Valves t) 1) am, 


DE 1 Yi (v, Dal 


K 


eee (cv, t) — X rjal, n) dW,, (17.64) 


l=1 


1] < k < K, with initial condition to = 1. 
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Figure 17.3. A simulated path for the default intensity in the model with incomplete infor- 
mation where additional information is modelled by the noisy observation process (Z;). The 
graph was created using the set-up of Example 12.7. 


In stochastic filtering, equation (17.64) is known as the Kushner—Stratonovich 
equation. We omit the formal proof of the proposition but try to give some intuitive 
explanation for equation (17.64). The jump part of this equation has the same form 
as in equation (17.59). This term represents the impact of default information on 
x; and it is responsible for contagion effects due to defaults. Next we consider the 
diffusion part. Define random variables a and Ix: S V>R by a(v) = a(v, t) and 
Ik(v) = Ity=v;,}. The coefficient of the diffusion part can then be written in the form 


E” (Ig, @) — E” (I) E" @ = cov” (Ig, ã). 


It follows that a positive increment dW, of the innovations process leads to an 
increase in zk if the rvs J; and a are positively correlated under the measure z;. 

The Kushner-—Stratonovich equation (17.64) lends itself to simulation. In partic- 
ular, the equation can be used to generate trajectories of (7+) and of the default 
intensities Às; = eG ak y; (ug, t), 1 < i < m. Details are given in Frey and 
Schmidt (2011). 

The extension of the investor information to the larger information set (4+) leads 
to rich and realistic dynamics for default intensities incorporating both random fluc- 
tuations of credit spreads between defaults and default contagion. This is illustrated 
in Figure 17.3, where we plot a typical simulated trajectory of the default intensity. 
The fluctuation of the intensity between defaults as well as the contagion effects at 
default times (e.g. around t = 600) can be clearly seen. 


Pricing of credit derivatives. All commonly encountered credit-risky instruments, 
credit derivatives and credit value adjustments can be categorized into the following 
two classes. 
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Basic credit products: this class comprises products where the cash-flow stream 
depends on the default history of the underlying portfolio and is therefore (#;)- 
adapted. Examples are corporate bonds, CDSs and CDOs. 


Options on traded credit products: this class contains derivatives whose pay-off 
depends on the future market value of basic credit products. Examples include 
options on corporate bonds, options on CDS indices, and credit value adjustments 
for credit derivatives, as discussed in Section 17.2. 


The pricing methodology for these product classes differs and we discuss them 
separately. We begin with basic credit products. Denote the associated stream of 
discounted future cash flows by the #r-measurable rv T(t, T). Then, using risk- 
neutral pricing the price at time ¢ of the basic credit product is given by V; = 
E2 (II (t, T) | Gr). Recall that 9, = H, V FZ. Using a similar approach to the 
derivation of (17.43) we obtain 


V, = DAC, vg, Yah, (17.65) 
k=1 
where h(t, V, Y) = EL (II (t, T) | ĝi), the hypothetical value of the claim for 
known V. Note that V, depends only on Y,, x; and the function h(-); the precise 
form of the function a(-) in (17.61) is irrelevant for the pricing of basic credit 
products. 

Next we consider options on traded credit products. Assume that N basic credit 
products—such as CDSs, CDO tranches or index swaps—are traded on the market, 
and denote their ex-dividend prices at time t by V;.1,..., Viv. The pay-off of an 
option on a traded credit product then takes the form 7 (F<7)8 (Ve, Viio VEND 
where T isa (G+) stopping time at which the pay-out is made and T is the AELS 
of the contract. A prime example is the credit value adjustment for a CDS. Denote 
by (VP) the market value of the counterparty-risk-free CDS. It was shown in 
Proposition 17.1 that the credit value adjustment is an option on the CDS with 
T = T, and pay-off 

gAn, Ve) = 0 -YRA — YAR). 
Another example of an option on a traded credit product would be an option on a 
CDS index, as discussed in Frey and Schmidt (2011). 

It can be shown that the process (Y;, 2;) is a Markov process in the investor 
filtration so that the price at time ¢ of an option on a traded credit product is a 
function of Y, and 7; of the form 


T 
e (exp (-f r(s) as) heer er. Vis E Vz y) | 4) 
t 


= 17.80. Yn m). (17.66) 


The actual evaluation of the function g is usually based on Monte Carlo methods. 
Note that for an option on traded credit products the function g will in general 
depend on the entire dynamics of the process (Y;, 77), as we will see in the analysis 
of credit value adjustments in the next section. 
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Calibration. Calibration methodologies are a crucial part of the model of Frey and 
Schmidt (2012) for the following reason. Recall that we view the process (Z+) gen- 
erating the filtration ġ as an abstract model-building device that is not directly linked 
to observable quantities. Consequently, the process (z,) is not directly observable by 
investors. On the other hand, pricing formulas in our model depend on the values of 
Y, and x;. Since pricing formulas need to be evaluated in terms of publicly available 
information, a key point in the application of the model is therefore to determine 
the realization of 7, at time t from prices of basic traded credit products observed 
at that date. 

Suppose that N basic credit products are traded at time t at market prices př, 
1 < n < N, and denote by hn (t, V, Y;) the value of contract n in the artificial model 
where V is known. We then need to find some probability vector a = (z!,...,7*)’ 
such that the model prices V” (x) = ary m*hy(t, vg, Y;) are close to the observed 
prices p> for all n. The reader will note that this problem is similar to the calibration 
problem arising in the implied copula framework of Section 12.3.3, and we refer to 
that section for details of algorithms and numerical results. 


17.4.4 Collateralized Credit Value Adjustments and Contagion Effects 


We now study the impact of different price dynamics on the size of credit value 
adjustments and on the performance of collateralization strategies for a single-name 
CDS. We are particularly interested in the influence of contagion. To see that conta- 
gion might be relevant for the performance of collateralization strategies, consider 
the scenario in which the protection seller defaults before the maturity of the CDS. 
In such a case contagion might lead to a substantial increase in the credit spread of 
the reference entity (the firm on which the CDS is written) and hence in turn to a 
much higher replacement value for the CDS. In standard collateralization strategies 
this is taken into account in a fairly crude way, and the amount of collateral posted 
before the default may well be insufficient to replace the CDS. 

Our exposition is based on Frey and Rosler (2014). We use the model with incom- 
plete information from the previous section for our analysis. Slightly extending 
the set-up of that section, we assume that the factor is a finite-state Markov chain 
(W,) (details are not relevant for our discussion here). We consider two versions 
of the model that differ with respect to the amount of information that is avail- 
able to investors. In the version with full information it is assumed that the process 
(W,) is observable, so there are no contagion effects. In the version with incom- 
plete information investors observe only the process Z; = fo a(W,) ds + B, for 
a(w) = clny(w) (and, of course, the default history). As we have seen before, 
under incomplete information there is default contagion caused by the updating of 
the conditional distribution of (WY) at default times. 

In order to compute credit value adjustments and to measure the performance 
of collateralization strategies, we use the bilateral collateralized credit value adjust- 
ment (BCCVA) introduced in Section 17.2.2. The actual computation of credit value 
adjustments is mostly carried out using Monte Carlo simulation. The numerical 
experiments that follow are based on a Markov chain (Y%) with K = 8 states 


638 17. Dynamic Portfolio Credit Risk Models and Counterparty Risk 


Table 17.2. Results of model calibration for the case study on credit value adjustments. 


State v1 v2 U3 v4 U5 U6 v7 vg 


TO 0.0810 0.0000 0.2831 0.0548 0.0000 0.0000 0.0000 0.5811 
yg 0.0000 0.0010 0.0027 0.0040 0.0050 0.0059 0.0091 0.0195 
yr 0.0031 0.0669 0.1187 0.1482 0.1687 0.1855 0.2393 0.3668 
Ys 0.0007 0.0245 0.0482 0.0627 0.0732 0.0818 0.1108 0.1840 


U1, ..., Ug, Where v; is the best state (lowest default probabilities for all firms con- 
sidered) and vg is the worst state. In order to calibrate the model we assume that the 
protection buyer B has a credit spread of 50 bp, the reference entity R has a credit 
spread of 1000 bp, and the protection seller S has a credit spread of 500 bp, so that 
B is of far better credit quality than S, moreover, the default correlations of the three 
firms are fixed to be pgr = 2.0%, pgs = 1.5% and prs = 5%. The results of the 
calibration exercise are given in Table 17.2. Note that the default intensities at any 
fixed time ¢ are comonotonic random variables. 

Now we present the results of the simulation study. We begin with an analysis of 
the performance of popular collateralization strategies, for which we refer the reader 
to Section 17.2.2. The market value of the counterparty-risk-free CDS referencing 
R will be denoted by (V5), We compare market-value collateralization where 
Cmarket — yCDS + < T, and threshold collateralization where 


Mı, M: = 
C; oo COVER ye = My )Ipycvs)+5 My} = Gr) = M2) ley s)-> m) 


for thresholds Mı and M2. Note that market-value collateralization can be viewed 
as threshold collateralization with Mı = M2 = 0. 

Numerical results illustrating the performance of these collateralization strategies 
are given in Table 17.3. We see that market value collateralization is very effective in 
the model with complete information. The performance of threshold collateralization 
is also satisfactory, as can be seen by comparing the CCVA for a small positive 
threshold with the CCVA for the uncollateralized case. The CDVA is quite low for 
all collateralization strategies, since in the chosen example B is of far better credit 
quality than S, so Q(&; = B) is very small. Under incomplete information, on 
the other hand, the performance of market-value and threshold collateralization is 
not entirely satisfactory. In particular, even for Mı = M2 = 0 the CCVA is quite 
high compared with the case of full information. The main reason for this is the 
fact that, because of the contagion effects, threshold collateralization systematically 
underestimates the market value of the CDS at Tı, which leads to losses for the 
protection buyer. Note that the joint distribution of the default times is the same in 
the two versions of the model, so the differences in the sizes of the value adjustments 
and in the performance of the collateralization strategies can be attributed to the 
different dynamics of credit spreads (contagion or no contagion) in the two model 
variants. Our case study therefore clearly shows that the dynamics of credit spreads 
matters in the management of counterparty credit risk. 

Finally, we show that for the given parameter values there is clear evidence for 
wrong-way risk. To demonstrate this we compare the correct value adjustments to 
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Table 17.3. Value adjustments in the model with complete information and in the model with 
incomplete information under threshold collateralization and market-value collateralization 
(Mı = M2 = 0). The nominal of the CDS is normalized to 1; all numbers are in basis points. 
In the last row we also report the value adjustment corresponding to the simplified value 
adjustment formulas (17.11) and (17.12). 


Full information Partial information 
e—a e—a 
Threshold CCVA CDVA BCCVA CCVA CDVA BCCVA 
Mi = M2 =0 0 0 0 35 0 35 
Mı = Mə = 0.02 16 0 15 45 0 45 
Mı = Mə = 0.05 38 1 37 60 0 60 
No collateralization with 

(i) correct formula 93 1 92 83 1 82 
Gii) simplified formula 68 6 62 54 4 49 


the value adjustments computed via the simplified formulas (17.11) and (17.12), 
both in the case of no collateralization. The results are given in the last two rows 
of Table 17.3. It turns out that in both versions of the model the value adjustments 
computed via the correct formula are substantially larger than the adjustments com- 
puted from the simplified formulas. This suggests that in situations where there is 
a non-negligible default correlation between the protection seller and the reference 
entity, the simplified formulas may not be appropriate. 


Notes and Comments 


There is a large literature on credit risk models with incomplete information. 
Kusuoka (1999), Duffie and Lando (2001), Giesecke and Goldberg (2004), Jar- 
row and Protter (2004), Coculescu, Geman and Jeanblanc (2008), Frey and Schmidt 
(2009), Cetin (2012) and Frey, Rosler and Lu (2014) all consider structural models 
in the spirit of the Merton model, where the values of assets and/or liabilities are not 
directly observable. The last three of these papers use stochastic filtering techniques 
to study structural models under incomplete information. 

Turning to reduced-form models with incomplete information, the models of 
Schonbucher (2004) and Collin-Dufresne, Goldstein and Helwege (2010) are sim- 
ilar to the model considered in Section 17.4.2. Both papers point out that the suc- 
cessive updating of the conditional distribution of the unobserved factor in reaction 
to incoming default observations generates information-driven default contagion. 
Duffie et al. (2009) assume that the unobservable factor (WY) is given by an Ornstein- 
Uhlenbeck process. Their paper contains interesting empirical results; in particular, 
the analysis provides strong support for the assertion that an unobservable stochastic 
process driving default intensities (a so-called dynamic frailty) is needed on top of 
observable covariates in order to explain the clustering of defaults in historical data. 
The link between stochastic filtering and reduced-form models is explored exten- 
sively in Frey and Runggaldier (2010) and Frey and Schmidt (2012). Our analysis 
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in Section 17.4.3 is largely based on these papers. The numerical results on the per- 
formance of collateralization strategies are due to Frey and Rosler (2014). A survey 
of credit risk modelling under incomplete information can be found in Frey and 
Schmidt (2011). 


Appendix 


A.1 Miscellaneous Definitions and Results 
A.l.1 Type of Distribution 


Definition A.1 (equality in type). Two rvs V and W (or their distributions) are 
said to be of the same type if there exist constants a > 0 and b € R such that 
v Saw+b. 


In other words, distributions of the same type are obtained from one another by 
location and scale transformations. 


A.1.2 Generalized Inverses and Quanitiles 


Let T be an increasing function, i.e. a function satisfying y > x => T(y) > T(x), 
with strict inequality on the right-hand side for some pair y > x. An increasing 
function may therefore have flat sections; if we want to rule this out, we stipulate 
that T is strictly increasing, soy > x <> T(y) > T(x). We first note some 
useful facts concerning what happens when increasing transformations are applied 
to rvs. 


Lemma A.2. 
(i) If X is an rv and T is increasing, then {X < x} C {T(X) < T(x)} and 
P(T(X) < T(x) = P(X <x) + P(T(X) = T(x), X > x). (A.1) 


(ii) If F is the df of the rv X, then P(F(X) < F(x)) = P(X <x). 


The second statement follows from (A.1) by noting that, for any x, the event given 
by {F(X) = F(x), X > x} corresponds to a flat piece of the df F and therefore 
has zero probability mass. 

The generalized inverse of an increasing function T is defined to be T€ (y) = 
inf{x: T(x) > y}, where we use the convention inf Ø = oo. Strictly speaking, 
this generalized inverse is known as the left-continuous generalized inverse. The 
following basic properties may be verified quite easily. 


Proposition A.3 (properties of the generalized inverse). For T increasing, the 
following hold. 

(i) T< is an increasing, left-continuous function. 

(ii) T is continuous <= T< is strictly increasing. 


(iii) T is strictly increasing <—» T^ is continuous. 
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F(x) 


Figure A.1. Calculation of quantiles in tricky cases. The first case (a) is a continuous df, 
but the flat piece corresponds to an interval with zero probability mass. In the second case (b) 
there is an atom of probability mass such that, for X with df F, we have P(X = qq(F)) > 0. 


For the remaining properties assume additionally that —oo < T<(y) < œo. 
(iv) If T is right continuous, T(x) > y => T<(y) <x. 
v) T€ oT(x) <x. 
(vi) T is right continuous => T o T€ (y) 2 y. 

(vii) T is strictly increasing => T © o T(x) = x. 

(viii) T is continuous => T o T4 (y) = y. 

We apply the idea of generalized inverses to distribution functions. If F is a df, 
then the generalized inverse F“ is known as the quantile function of F. In this 
case, fora € (0, 1), we also use the alternative notation qa (F) = F< (æ) for the 
a-quantile of F. Figure A.1 illustrates the calculation of quantiles in two tricky 
cases. 

In general, since a df need not be strictly increasing (part (a) of the figure), we have 
F< o F(x) < x, by Proposition A.3 (v). But the values x, where F© o F(x) Æ x, 


correspond to flat pieces and have zero probability mass. That is, we have the fol- 
lowing useful fact. 


Proposition A.4. If X is an rv with df F, then P(F~ o F(X) =X) =1. 


The following proposition shows how quantiles can be computed for transformed 
random variables and it is used in Section 7.2.1 to prove the comonotone additivity 
of value-at-risk. 


Proposition A.5. For a random variable X and an increasing, left-continuous func- 
tion T , the quantile function of T (X) is given by 


Fín (@) = T (FE (@)). 


Proof. Since Fr x) is a right-continuous function, Proposition A.3 (iv) tells us that 
for any point x we have the equivalence 


Frey) (@) Sx 4> Fræ) 2a. (A.2) 
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Let yo(x) = sup{y: T(y) < x} and observe that the left continuity of T implies 
TŒ) <x => y < yo). (A.3) 


From (A.3) we conclude that the events {T (X) < x} and {X < yo(x)} are identical, 
from which it follows that 


Fræ) Sa 4> Fx(yo(x)) 2 a. (A.4) 
We also have that 
b 
Fy) >a BS F(a) < ya) & TFS (@)) <x, (A.5) 


where we again use Proposition A.3 (iv) to establish (a) and we use (A.3) to estab- 
lish (b). The equivalences (A.2), (A.4) and (A.5) together with the fact that x is an 
arbitrary point prove the lemma. 


A.1.3 Distributional Transform 


The distributional transform can be used to prove the general version of Sklar’s The- 
orem (Theorem 7.3), an insight due to Rüschendorf (2009). For a random variable 
X with distribution function F, define the modified distribution function by 


F(x, A) = P(X <x) +AP(X =x), à € [0,1]. (A.6) 


Obviously we have F(x, 2) = F(x—) +A(F (x) — F(x—)), and if F is continuous 
at x, then F(x, A) = F(x), but in general F(x—) < F(x, à) < F(x). 
The distributional transform of X is given by 


U := F(X, V), (A.7) 


where V ~ U (0, 1) is a uniform rv independent of X and F (x, à) is as in (A.6). The 
following result shows how the distributional transform generalizes the probability 
transform of Proposition 7.2. 


Proposition A.6. For an rv X with df F let U = F(X, V) be the distributional 
transform of X. Then U ~ U (0, 1) and X = F< (U) almost surely. 


Proof. For given u we compute P(U < u). Let qu) = P(X < F< (u)) and 
p(u) = P(X = F< (u)) and observe that 


{F(X,V) < u} = {X < F“ (u)} U {X = F“ (u), qu) + Vp(u) < u}. 


There are two cases to consider: either F is continuous at the u-quantile F © (u), in 
which case p(u) = 0 and q(u) = u; or F has a jump at the u-quantile, in which 
case p(u) > 0. In the former case 


PU <u) = P(X < F“ (u)) = qu) =u, 


and in the latter case 


PU <u) =qlu) + pwp(v < a) = 
plu) 


which proves that U has a uniform distribution. 
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To prove the second assertion fix w € §2 and let x = X (œw) and u = U(@) = 
F(X(o), V(@)). Clearly we have F(x—) < u < F(x). If F(x—) < u < F(x) then 
F< (u) = x, but if F(x—) = u then we may have that F€ (u) < x. However, the 
values of w for which the latter situation may occur comprise a null set. 


A.1.4 Karamata’s Theorem 


The following result for regularly varying functions is used in Chapter 5. For more 
details see Bingham, Goldie and Teugels (1987). Essentially, the result says that the 
slowly varying function can be taken outside the integral as if it were a constant. Note 
that the symbol “~” indicates asymptotic equality here, i.e. if we write a(x) ~ b(x) 
as x — xo, we mean lim,-. x) a(x) /b(x) = 1. 


Theorem A.7 (Karamata’s Theorem). Let L be a slowly varying function that is 


locally bounded in [xo, 0) for some xp > 0. Then, 


x: 
1 
(a) fork > -i f t L(t) dt ~ ——x*t! L(x), x > 00, 
k+l 


x0 


0O 
1 
(b) fork < -i f t L(t) dt ~ ———x“t! L(x), x > œ. 
k+1 


X 
A.1.5 Supporting and Separating Hyperplane Theorems 


The following result on the existence of supporting and separating hyperplanes 
is needed at various points in Chapter 8. For further information we refer to 
Appendix B2 in Bertsekas (1999), Section 2.5 in Boyd and Vandenberghe (2004) 
and Chapter 11 of Rockafellar (1970). 


Proposition A.8. Consider a convex set C C R” and some xg € R”. 


(a) If xo is not an interior point of C, there exists a supporting hyperplane for C 
through xo, i.e. there is someu € IR" \ {0} such thatu'xo > sup{u’x: x € C}. 


(b) Ifxo does not belong to the closure C of C , one has strict separation, i.e. there 
is some u € R” \ {0} such that u'xo > sup{u’x: x € C}. 


A.2 Probability Distributions 


The gamma and beta functions appear in the definitions of a number of these distri- 
butions. The gamma function is 


Co 
T (æ) al xTle™* dx, a>O0, (A.8) 
0 
and it satisfies the useful recursive relationship l (œ + 1) = «œl (œ). The beta 
function is 
1 
b(a, b) z xT! — x)! dx, a,b > o0. (A.9) 
0 


It is related to the gamma function by £ (a, b) = T (a)r (b)/T (a + b). 
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A.2.1 Beta 


The rv X has a beta distribution, written X ~ Beta(a, b), if its density is 


fœ) = Sky. OSH Si, ob SO: (A.10) 


Ba, b) 


where f(a, b) is the beta function in (A.9). The uniform distribution U (0, 1) is 
obtained as a special case whena = b = |. The mean and variance of the distribution 
are, respectively, E(X) = a/(a + b) and var(X) = (ab)/((a +b + 1)(a + b)’). 


A.2.2 Exponential 

The rv X has an exponential distribution, written X ~ Exp(A), if its density is 
f(x) =ae", x >0,A>0. (A.11) 

The mean of this distribution is E(X) = A7! and the variance is var(X) = A7?. 


A.2.3 F 


The rv X has an F distribution, written X ~ F (v1, v2), if its density is 


| y \"2 x(1-2)/2 
= , > 0, v, v > 0. 
PO Bn el Fvat 8 2 Oe 


(A.12) 
The mean of this distribution is E(X) = v2/(v2 — 2) provided that vz > 2. Provided 
that v2 > 4, the variance is 


v2 a 


var(X) = 2 i 
v2 — 2 vı (vı — 4) 


A.2.4 Gamma 


The rv X has a gamma distribution, written X ~ Ga(«, B), if its density is 


f(a) = a 


= Tah , x>0,a>0, B>0, (A.13) 


where I (œ) denotes the gamma function in (A.8). Using the recursive property of 
the gamma function, the mean and variance of the gamma distribution are easily 
calculated to be E(X) = a/f and var(X) = a/ B. For fitting a multivariate t dis- 
tribution using the EM approach of Section 15.1.1 it is also useful to know that 
E(ln X) = wW(a@) — In(B), where w(k) = dln(I (k))/dk is the digamma or psi 
function. 

An exponential distribution is obtained in the special case when a = 1. If X ~ 
Ga(a, B) and k > 0, then kX ~ Ga(a, 6/k). For two independent gamma variates 
Xı ~ Ga(«ı, B) and X2 ~ Ga(az, B), we have that X; + X2 ~ Ga(a, + a2, B). 
Note also that, if X ~ Ga(4v, J), then X has a chi-squared distribution with v 
degrees of freedom, also written X ~ x2. 
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A.2.5 Generalized Inverse Gaussian 


The rv X has a generalized inverse Gaussian (GIG) distribution, written X ~ 
N- (A, x, Y), if its density is 


=} À 


where K, denotes a modified Bessel function of the third kind with index à and the 
parameters satisfy x > 0, y >O0ifrA<0; x >0,y > OifA = 0; and x È 0, 
y >Oif A > 0. For more on this Bessel function see Abramowitz and Stegun 
(1965). 

The GIG density actually contains the gamma and inverse gamma densi- 
ties as special limiting cases, corresponding to x = 0 and y = 0, respectively. 
In these cases (A.14) must be interpreted as a limit, which can be evaluated 
using the asymptotic relations K} (x) ~ T'(A)24-1x-4 as x > 0+ for A > 0 and 
K, (x) ~ DP (—A)2~*7!x* as x —> 0+ for A < 0. The fact that Kj, (x) = K, (x) is 
also useful. In this way it can be verified that, forà > Oand x = 0, X ~ Ga(A, iy). 
Ifà < 0 and w = 0, we have X ~ Ig(—A, 5X) The case à = -5 is known as the 
inverse Gaussian distribution. Note that, in general, if Y ~ N (A, x,y), then 
1/Y ~ N~(-A, WX). 

For the non-limiting case when x > 0 and y > 0 it may be calculated that 


fœ) = 


a/2 
x Kytal/ xv) 
E(X“) = ; R, A.15 
ea (3) KAD | g 
dE(X®) 
E(ln X) = da : (A.16) 
a=0 


A.2.6 Inverse Gamma 


The rv X has an inverse gamma distribution, written X ~ Ig(q, £), if its density is 
Qa 
Tar- ODS. 656 OS ESO. (A.17) 
T (œ) 
Note that if Y ~ Ga(«, £), then 1/Y ~ Ig(«œ, B). Provided that a > 1, the 


mean is E(X) = 6/(a@ — 1), and provided that a > 2, the variance is var(X) = 
B* /((a@ — 1)? (æ — 2)). Moreover, E(In X) = In(f) — Y (æ). 


A.2.7 Negative Binomial 

The rv N has a negative binomial distribution with parameters a > Oand0 < p < 1, 

written N ~ NB(a, p), if its probability mass function is 

a+k—1 
k 


where C) for x € R and k € No denotes an extended binomial coefficient defined 
by (5) = 1 and 


piv = =( jra -p k=0,1,2,..., (A.18) 


k>0. 


x — x@—1)-+-@-k+)) 
Ge k! , 
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The moments of this distribution are 
E(N) =a(1— p)/p and var(N) = a(1 — p)/p°. 


Fora = r € N the rv N +r represents the waiting time until the rth success 
in independent Bernoulli trials with success probability p, i.e. the total number of 
trials that are required until we have r successes. For a = 1 the rv N + 1 is said to 
have a geometric distribution. 


A.2.8 Pareto 


The rv X has a Pareto distribution, written X ~ Pa(«, x), if its df is 


r@=1-( 5); a,x > 0, x >20. (A.19) 
Provided that œ > n, the moments of this distribution are given by 
E(X") = E 
M-e- i 


A.2.9 Stable 


The rv X has an -stable distribution, written X ~ St(q, B, y, ô), if its characteristic 
function is 


p(t) = Ee% 


= exp(—y@|t|* (1 — iB sign(f) tan(a@/2)) + iôt) a £1, (A.20) 
E exp(—y|t|(1 + i£ sign(t)(2/x)ln |t|) + iôt), a =l, i 


where a € (0,2], 6 € [-1,1], y > O and ô € R. Note that there are various 
alternative parametrizations of the stable distributions and we use a parametrization 
of Nolan (2003, Definition 1.8). The case X ~ St(a, 1, y, 0) fora < 1 gives a dis- 
tribution on the positive half-axis, which we refer to as a positive stable distribution. 

A simulation algorithm for a standardized variate Z ~ St(a, 6, 1,0) is given 
in Nolan (2003, Theorem 1.19). In the case wherea A 1, X = ô+ yZ has a 
St(a@, P, y, ô) distribution; the case a = 1 is more complicated. 


A.3 Likelihood Inference 


This appendix summarizes the mechanics of performing likelihood inference, but 
omits theoretical details. A good starting reference for the theory is Casella and 
Berger (2002), which we refer to in this appendix where relevant. Other useful 
books include Serfling (1980), Lehmann (1983, 1986), Schervish (1995) and Stuart, 
Ord and Arnold (1999), all of which give details concerning the famous regularity 
conditions that are required for the asymptotic statements. 
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A.3.1 Maximum Likelihood Estimators 


Suppose that the random vector X = (X ,..., Xa)’ has joint probability density (or 
mass function) in some parametric family fy (x; 0), indexed by a parameter vector 
0 = (1, ..., 0p)’ in a parameter space @. We consider our data to be a realization 
of X for some unknown value of 0. 

The likelihood function for the parameter vector 0 given the data is L(@; X) = 
Sx (X; 0), and the maximum likelihood estimator (MLE) 6 is the value of 6 max- 
imizing L(@; X), or equivalently the value maximizing the log-likelihood function 
10; X) = InL(@; X). We will also write this estimator as 6, when we want to 
emphasize its dependence on the sample size n. 

For large n we expect that the estimate 6, will be close to the true value @, 
and various well-known asymptotic results give information about the quality of 
the estimator in large samples. In describing these results we consider the classical 
situation where X is assumed to be a vector of iid components with univariate density 


f,so 
In L(0; X) = in] | f(X; = Soin L(0; X;). 


i=l i=l 
A.3.2 Asymptotic Results: Scalar Parameter 


We consider the case when p = | and we have a single parameter 0. Under suitable 
regularity conditions (see, for example, Casella and Berger 2002, p. 516), 6, may be 
shown to be a consistent estimator of 0 (i.e. tending to 6 in probability as the sample 
size n is increased). Notable among the regularity conditions are that 6 should be 
an identifiable parameter (0 4 06 > f(x; 0) Æ f(x; 6)), that the true parameter 0 
should be an interior point of the parameter space ©, and that the support of f (x; 0) 
should not depend on 0. 

Under stronger regularity conditions (see again Casella and Berger 2002, p. 516), 
6, may be shown to be an asymptotically efficient estimator of 0, so it satisfies 


Jn On — 0) 5 N (0, 1(0)7}), (A.21) 


where Z (0) denotes the Fisher information of an observation, defined by 
3 2 
I(0) = e(2 In L (8; x) ; (A.22) 


Under the regularity conditions, the Fisher information can generally also be calcu- 
lated as 


32 
1(0) = -2(5 In L(0; D»). (A.23) 


Asymptotic efficiency entails both asymptotic normality and consistency. More- 
over, it implies that, in a large enough sample, var (ê ) ~ 1/(nI(@)), where the right- 
hand side is the so-called Cramér—Rao lower bound, which is a lower bound for the 
variance of an unbiased estimator of 0 constructed from an iid sample Xj,..., Xn. 
The MLE is efficient in the sense that it attains this lowest possible bound asymp- 
totically. 
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A.3.3 Asymptotic Results: Vector of Parameters 


When p > | and we have a vector of parameters to estimate, similar results apply. 
The MLE 0, of 0 is asymptotically efficient in the sense that, as n —> oo and under 
suitable regularity conditions, 


Vn, — 0) + Np, 107), (A.24) 


where Z (0) denotes the expected Fisher information matrix for a single observation, 
given, in analogy to (A.22) and (A.23), by 


a a 3? 
I(@ = E| —InL(@; X)— 1n L(0; X) | = -E In L(0; X) ). 
(0) (3 nL( Jag nL( >) (seam n L( ) 


The notation employed here should be taken to mean a matrix with components 
2 


06; 00; 
The convergence result (A.24) implies that, for n sufficiently large, we have 


6, ~ Np(0,.n'1(@)~'), (A.25) 


ð a 
1(0)i; = (Zm LO; Drt LO; x) = -e( In L(0; x»), 


and this can be used to construct asymptotic confidence regions for @ or intervals 
for any component 6j. In practice, it is often easier to approximate / (0) with the 
observed Fisher information matrix 

2 IQ @? 

1@) = —— 2 zggy nL: Xi) 
for whatever realization of X has been obtained. This should converge to the expected 
information matrix by the law of large numbers, and it has been suggested that in 
some situations this may even lead to more accurate inference (Efron and Hinkley 
1978). In either case, the information matrices depend on the unknown parameters 
of the model and are usually estimated by taking 7 (6 ) or I (6). 


A.3.4 Wald Test and Confidence Intervals 

From (A.25) we have that, for n sufficiently large, 
0; — 0; 
se(0;) 
where se(0;) denotes an asymptotic standard error (an estimate of the asymptotic 
standard deviation) for 6;, given by 


se(6;) = VON, or VO 
Equation (A.26) can be used to test the null hypothesis Ho: 0j = 0j o for some 
value of interest 6; o against the alternative Hı: 0; Æ 0;,9. For an asymptotic test 
of size a we would reject Ho if |Z| > @-'( — 5a). 
An asymptotic 100(1 — @)% confidence interval for 0; consists of those values 
9,9 for which the null hypothesis is not rejected and it is given by 


(6; — seĝ DT! — 4a), 6; + seô; T'A — 4a)). (A.27) 


LS 


~ N(O, 1), (A.26) 
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A.3.5 Likelihood Ratio Test and Confidence Intervals 


Now consider testing the null hypothesis Ho: 0 € Oo against the alternative Hı : 0 € 
Oğ, where Oo C O. We consider the likelihood ratio test statistic 


SUPpc@, LO; X) 


A(X) = 
supọco L(0; X) 


and assume, as before, that X1,..., Xn are iid and that appropriate regularity 
conditions apply. Under the null hypothesis it can be shown that, as n — oo, 
—2Inr(X) ~ x, where the degrees-of-freedom parameter v of the chi-squared 
distribution is essentially given by the number of free parameters specified by © 
minus the number of free parameters specified by the null hypothesis 8 € @p. 

For example, suppose that we partition 0 such that 0’ = (0, 05), where 0; has 
dimension q and 62 has dimension p — q. We wish to test Ho: 0; = 01,0 against 
Hı: 0; Æ 01,0. Writing the likelihood as L(61, 02), the likelihood ratio test statistic 
satisfies 


-2n A(X) = —2(n L@,0, 62,0; X) — In L@, 6; X)) ~ x2 


asymptotically, where 6 and 6 are the unconstrained MLEs of 6; and 62, and Êz o 
is the constrained MLE of 62 under the null hypothesis. We would reject Ho if 
—2ln À (X) > c4,1—a, Where c4,1—a is the (1 — a)-quantile of the x distribution. 

An asymptotic 100(1 — œ)% confidence set for 0; consists of the values 1 9 for 
which the null hypothesis Ho: 0; = 01,0 is not rejected: that is, 


{01,0: In L(01,0, Ê» o; x) > In LÊ, 6: x)- 0.5¢g,1-a}- 


In particular, if q = 1, so that we are interested only in 01, we get the confidence 
interval 
{41,9: In L(1,0, 62,0; x) > In L(61, 62; x) — 0.5c1 1-a}. (A.28) 


Note that such an interval will, in general, be asymmetric about the MLE 6, , in the 
sense that the distances from the MLE to the upper and lower bounds will be different. 
This is in contrast to the Wald interval in (A.27), which is rigidly symmetric. 

The curve (61,0, In L(@1,0, 62,0; x)) is sometimes known as the profile log-like- 
lihood curve for 6; and it attains its maximum at ĝi. 


A.3.6 Akaike Information Criterion 


The likelihood ratio test is applicable to the comparison of nested models, i.e. situ- 
ations where one model forms a special case of a more general model when certain 
parameter values are constrained. We often encounter situations where we would like 
to compare non-nested models with possibly quite different numbers of parameters. 

Suppose we have m models M4, ..., Mm and that model j has kj parameters 
denoted by 0; = (@j1,..., Dik; y and a likelihood function L j(0;; X). In Akaike’s 
approach we choose the model minimizing 


AIC(M;) = —2 In L ;(6;; X) + 2k;, 
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where 6; denotes the MLE of 6;. The AIC number essentially imposes a penalty 
equal to the number of model parameters k; on the value of the log-likelihood at 
the maximum. The model that is favoured is the one for which the penalized log- 
likelihood In L ; (6 i; X) — kj is largest. There are alternatives to the AIC, such as the 
Bayesian information criterion (BIC) of Schwarz, which impose different penalties 
for the number of parameters. See Burnham and Anderson (2002) for more about 
model comparison using these criteria. 
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342 
hyperbolic distribution, 190 


IGARCH (integrated GARCH), 121 
illiquidity premium, 29 
immunization of bond portfolio, 332 
implied copula model (credit), 497 
and incomplete information, 630 
calibration, 499 
implied volatility, 57 
importance sampling, 457 
application to Bernoulli mixture 
models, 460 
density, 458 
exponential tilting, 459 
for general probability spaces, 459 
incomplete markets, 405 
incremental risk charge, 23, 67 
inhomogeneous Poisson process, see 
Poisson process 
insolvency, 43 
insurance analytics, 512 
literature on, 533 
the case for, 512 
intensity, see default intensity 
inverse gamma distribution, 646 
in MDA of Fréchet, 573 
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Jarque—Bera test, 85 


Karamata’s Theorem, 644 

Kendall’s tau, 204, 244 
Archimedean copulas, 261 
estimation of t copula, 268 
Gaussian and t copulas, 254 
sample estimate, 267 

KMV model, see EDF model 

kurtosis, 85, 181 


L-estimators, 347 
lead-lag effect, 539 
Lehman Brothers bankruptcy, 14 
leptokurtosis, 80, 86 
leverage 
in GARCH model, 122 
in Merton model, 385 
ratio, 18, 24 
LGD (loss given default), 23, 51, 427 
liabilities, 42 
linearization 
loss operator, 327 
variance—covariance method, 58 
liquidity 
premium, 29 
risk, 5, 67 
funding liquidity risk, 5, 43, 44 
Ljung—Box test, 80, 87, 107 
loans, 367 
log-returns, 49 
generalized hyperbolic models, 191 
longer-interval returns, 87 
non-normality, 86, 181 
overlapping, 87 
stylized facts, 79 
Long-Term Capital Management, see 
LTCM case 
loss distribution, 48 
conditional, 339 
linearization, 48 
operational, 508 
P&L, 48 
quadratic approximation, 50 
risk measures based on, 62, 64, 69 
unconditional, 339 
loss given default, see LGD 
loss operator, 326, 327 
LT-Archimedean copulas, 263 
one-factor, 263 
p-factor, 570 
LTCM (Long-Term Capital 


Management) case, 10, 28, 36, 40, 
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mapping of risks, 48, 325, 326 
examples, 49 
annuity portfolio backed by 
bonds, 52 
bond portfolio, 330 
European call option, 49, 328 
loan portfolio, 51 
stock portfolio, 49 
loss operator, 326 
quadratic loss operator, 327 
market risk, 5 
regulatory treatment, 16, 17 
standard statistical methods, 338, 
358 
treatment in Basel framework, 23 
use of time-series methods, 343 
market-consistent valuation, 26, 44, 54 
Markov chain, 376 
generator matrix, 379 
Markowitz portfolio optimization, 9 
martingale 
martingale-difference sequence, 99, 
541 
measure, 55 
modelling, 398 
maxima, 135 
block maxima method, 142 
estimating return levels and 
periods, 144 
Fisher—Tippett-Gnedenko Theorem, 
137 
GEV distribution as limit, 136 
maximum domain of attraction, 137, 
139 
Fréchet, 138, 139, 158, 572 
Gumbel, 138, 140, 573 
Weibull, 140 
models for minima, 138 
multivariate, 586 
multivariate, 583 
block maxima method, 589 
maximum domain of attraction, 
583 
of stationary time series, 141 
maximum domain of attraction, see 
maxima 
maximum likelihood inference, 647 
MBS (mortgage-backed security), 12 
MDA (maximum domain of attraction), 
see maxima 
mean excess 
function, 148 
plot, 151 
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Merton model, 380 
extensions, 385, 391 
modelling of default, 380 
multivariate version, 430 
pricing of equity and debt, 381 
volatility of equity, 384 
meta distributions, 229 
meta-t distribution, 229 
meta-Gaussian distribution, 229 
MGARCH model, see multivariate 
GARCH models 
minimum capital requirement, 25, 46 
mixability, 307 
complete mixability, 307, 308 
d-mixability, 308 
joint mixability, 307 
mixed Poisson 
distributions, 524 
example of negative binomial, 525 
process, 532 
mixture models (credit), 436 
Bernoulli mixture models, see 
Bernoulli mixture models 
Poisson mixture models, 444, 470 
CreditRisk*, see CreditRiskt 
ML, see maximum likelihood inference 
model risk, 5 
in credit risk models, 433, 450 
Modigliani—Miller Theorem, 32 
Monte Carlo method, 60, 346 
application to credit risk models, 457 
critique of, 346 
importance sampling, see importance 
sampling 
rare-event simulation, 457 
Moody’s 
binomial expansion technique, see 
binomial expansion technique 
public-firm EDF model, 386 
mortgage-backed security (MBS), 12 
multivariate distribution, 174 
elliptical, see elliptical distributions 
generalized hyperbolic, see 
generalized hyperbolic 
distributions 
normal, see normal distribution 
normal mixture, see normal mixture 
distributions 
t distribution, see t distribution 
multivariate extreme value theory, see 
extreme value theory 
multivariate GARCH models, 545 
estimation using ML, 553 
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examples 
BEKK, 552 
CCC, 547 
DCC, 549 
DVEC, 550 
orthogonal GARCH, 556 
PC-GARCH, 556 
pure diagonal, 548 
VEC, 550 
general structure, 545 
use in risk measurement, 557 


negative binomial distribution, 646 
mixed Poisson distribution, 525 
Panjer class, 522 

Nelson-—Siegel model, 333 

NIG distribution, 190 

normal distribution 
expected shortfall, 70 
for return data, 80 
multivariate, 178 

copula of, 226 
properties, 179 
simulation, 178 
spherical case, 197 
testing for, 180 
variance—covariance method, 59, 
341 
tests of normality, 85, 180 
unsuitability for log-returns, 86, 181 
value-at-risk, 65 
normal inverse Gaussian distribution, 
190 

normal mixture distributions, 183 
copulas of, 249 
examples 

generalized hyperbolic, 186, 188 
t distribution, 185 
two point mixture, 185 
mean-variance mixtures, 187 
tail behaviour, 574 
variance mixtures 
simulation, 187 
spherical case, 197 
variannce mixtures, 183 
notional-amount approach, 61 


operational risk, 5 
approaches to modelling, 505, 506 

advanced measurement (AM), 
506 
basic indicator (BI), 505 
loss distribution approach (LDA), 
505 
standardized (S), 506 
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operational risk (continued) 
data issues, 509 
regulatory treatment, 24, 503 
operational time, 408 
ORSA (own risk and solvency 
assessment), 19, 27 
orthogonal GARCH model, 556 
OTC (over-the-counter) derivatives, 366, 
372, 599, 603 


P&L distribution, see loss distribution 
Panjer 
distribution class, 521 
recursion, 522 
Pareto distribution, 232, 647 
in MDA of Fréchet, 138, 139 
payment-at-default claim, 401, 411 
PCA, see principal component analysis 
peaks-over-threshold model, see POT 
model 
percentile (as risk measure), 23, 64 
physical measure, 55 
Pickands—Balkema—de Haan Theorem, 
149 
point processes, 164, 165 
counting processes, 527 
of exceedances, 166 
Poisson point process, 165 
self-exciting processes, 578 
Poisson mixture distributions, 524 
Poisson process, 526, 527 
characterizations of, 528 
counting process, 527 
inhomogeneous, 529 
example of records, 530 
time changes, 531 
limit for exceedance process, 84, 166 
multivariate version, 529 
Poisson cluster process, 169 
POT model, 167 
portmanteau tests, 107 
POT model, 166 
as two-dimensional Poisson process, 
167 
estimation using ML, 168 
self-exciting version, 580 
unsuitability for financial time series, 
169 
principal component analysis, 209, 214 
bond portfolios, 335 
link to factor models, 215, 217 
PC-GARCH, 556 
probability transform, 222 
procyclical regulation, 28 
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profile likelihood, 650 
confidence interval for quantile 
estimate, 155 
pseudo-maximum likelihood copula 
estimation, 271 


Q-Q plot, 85, 180 
QIS, see Quantitative Impact Studies 
QML, see quasi-maximum likelihood 
inference 
quadratic loss operator, 327, 328 
quantile 
function, 65, 222, 642 
transform, 222 
Quantitative Impact Studies 
operational risk, 505, 509 
quasi-maximum likelihood inference, 
125, 126, 150 


radial symmetry, 232 
rank correlation, 243 
Kendall’s, see Kendall’s tau 
properties, 246 
sample rank correlations, 266 
Spearman’s, see Spearman’s rho 
rating agencies, 374 
and CDO pricing, 480 
rearrangement algorithm, 308, 314 
recovery modelling 
mixture models, 440 
recovery of market value, 414 
reduced-form (credit risk) models, 367, 
391, 600 
incomplete information, 625 
interacting default intensities, 601 
regularly varying function, 139 
regulation, 15, 20 
Basel framework, see Basel 
regulatory framework 
criticism 
fair-value accounting, 28 
market-consistent valuation, 28 
mathematical focus, 29, 35 
procyclicality, 28 
criticism of, 28 
societal view, 30 
Solvency II, see Solvency II 
framework 
Swiss Solvency Test (SST), 20 
US insurance regulation, 19 
view of shareholder, 32 
regulatory capital, see capital 
rehypothecation, 609 
renewal process, 529 
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return level, 144 
return period, 145 
rho of option, 50, 328 
Ricatti equation in CIR model, 419 
risk, 3 
aggregation, 299 
credit risk, see credit risk 
endogenous risk, 28 
history of, 8 
liquidity risk, see liquidity risk 
management, see risk management 
market risk, see market risk 
measurement, see risk measurement 
operational risk, see operational risk 
overview of risk types, 5 
randomness and, 3 
reasons for managing, 30 
systemic risk, see systemic risk 
risk factors, 48, 326 
mapping, 326 
risk-factor changes, 48, 79, 327 
risk management, 6 
failures, 10 
AIG, 14 
Barings, 10 
crisis of 2007-9, 13 
Equitable Life, 11 
Lehman Brothers, 14 
LTCM, 11 
ideal education, 37 
role of actuaries, 7 
role of mathematics, 35 
risk measurement, 6, 61, 358 
approaches, 61 
based on loss distributions, 62 
based on scenarios, 62 
notional-amount approach, 61 
conditional versus unconditional, 359 
standard market-risk methods, 338 
risk measures, 61, 275 
acceptance sets, 276 
axioms, 72, 276 
convexity, 74, 276 
monotonicity, 73, 276 
positive homogeneity, 74, 276 
subadditivity, 73, 276 
translation invariance, 73, 276 
backtesting, 351 
based on loss distributions, 62 
based on loss functions, 278 
coherent, see coherent risk measures 
comonotone additivity of, 288 
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convex, see convex risk measures, 
286 
defined by acceptance set, 277 
elicitability, 355 
estimation, 347 
examples 
conditional VaR, 78 
distortion risk measures, 286 
drawdowns, 78 
expected shortfall, see expected 
shortfall 
expectile, 290 
Fischer premium principle, 77 
generalized scenario, 63 
partial moments, 69 
semivariance, 69 
stress test, 279 
tail conditional expectation, 78 
value-at-risk, see value-at-risk 
variance, 69 
worst conditional expectation, 78 
law-invariant risk measures, 286 
linear portfolios, 293 
scaling, 349 
scenario-based, 62, 279 
uses of, 61 
risk-neutral 
measure, 55, 57 
valuation, 55, 394 
hedging, 395 
pricing rule, 395 
RiskMetrics 
birth of VaR, 16 
documentation, 60, 337 
treatment of bonds, 338 
robust statistics, 203 
RORAC (return on risk-adjusted 
capital), 315 


sample mean excess plot, 151 
scaling of risk measures, 349 
Monte Carlo approach, 350 
square-root-of-time, 350 
securitization, 11, 478 
self-exciting processes, 529, 578 
self-exciting POT model, 580 
estimating risk measures, 581 
predictable marks, 581 
unpredictable marks, 580 
semivariance, 69 
shortfall contributions, 318 
simplex distribution, 567 
skewed ¢ distribution, 191 
skewness, 85, 181 
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Sklar’s Theorem, 222 
slowly varying function, 139, 644 
solvency, 43 
solvency capital requirement (SCR), see 
Solvency II framework, solvency 
capital requirement 
Solvency II framework, 18, 25 
criticism, 28 
market-consistent valuation, 26 
risk margin, 26 
solvency capital requirement, 25, 46, 
68 
relation to VaR, 68 
Solvency I, 19 
standard formula, 26 
Spearman’s rho, 245 
for Gauss copula, 254 
use in estimation, 267 
sample estimate, 266 
spectral risk measures, see distortion risk 
measures 
spherical distributions, 196 
tail behaviour, 574 
square-root processes, see CIR model 
square-root-of-time rule, 350 
stable distribution, 264, 647 
stationarity, 98, 540 
stochastic filtering, 629 
Kushner—Stratonovich equation, 635 
strategically important financial 
institution (SIFD, 15 
stress-test risk measure, 279 
strict white noise, 99, 541 
structural models, see firm-value models 
Student ¢ distribution, see t distribution 
stylized facts 
financial time series, 79 
multivariate version, 88 
operational risk data, 509 
subadditivity, see risk measures, axioms, 
subadditivity 
superadditivity 
examples for value-at-risk, 75 
survival claim, 400 
survival copulas, 232 
Swiss Solvency Test (SST), 20 
systemic risk, 15, 31 


t copula, 228 
estimation 
Kendall’s tau method, 268 
maximum likelihood, 272 
grouped, 257 
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joint quantile exceedance 
probabilities, 251 
Kendall’s tau, 254 
simulation, 229 
skewed, 256 
tail dependence, 250 
t distribution 
expected shortfall, 71 
for return data, 80 
in MDA of Fréchet, 573 
multivariate, 185 
copula of, 228 
skewed version, 191 
variance—covariance method, 
341 
value-at-risk, 65 
tail dependence, 90, 231, 247 
examples 
Archimedean copulas, 261 
elliptical distributions, 576 
Gumbel and Clayton copulas, 248 
t copula, 250 
tail equivalence, 573 
tail index, 139, 158 
tails of distributions, 572 
compound sums, 525 
mixture distributions, 574, 575 
regularly varying, 139, 572 
term structure of interest rates, 330 
threshold copulas, 594 
lower limit, 594 
upper limit, 595 
use in modelling, 597 
threshold exceedances (EVT), see 
exceedances of thresholds 
threshold models (credit), 426 
equivalent Bernoulli mixture models, 
441 
examples based on 
Archimedean copulas, 433, 443 
Clayton copula, 443 
Gauss copula, 430, 431 
normal mean-variance mixture 
copulas, 432 
t copula, 432, 443 
model risk, 433 
role of copulas, 428 
Tier 1 capital, see capital 
Tier 2 capital, see capital 
top-down models (credit), 602 
trading book, 21, 23 
Turner Review, 14, 36 
type of distribution, 136, 641 
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value-at-risk, 64 variance—covariance method, 58, 340, 
acceptance set, 278 359 
additivity for comonotonic risks, 237 extensions, 341 
backtesting, 352 generalized hyperbolic and 
bounds for, 305 t distributions, 341 
capital allocation with, 317 limitations, 341 
definition as quantile, 64 variance-gamma distribution, 190 
elicitability, 356 VARMA (vector ARMA), 542 
estimation, 347 VEC (vector) GARCH model, 550 
time-series context, 133 vega of option, 50, 328 
examples of calculation volatility, 80 
normal distribution, 71 as conditional standard deviation, 82 
GPD tail model, 154 clustering, 80 
normal distribution, 65 forecasting, 129 
t distribution, 65, 71 EWMA, 132 
non-coherence, 297, 312 GARCH, 130 
non-subadditivity, 74 von Mises distributions, 573 


origins of, 16 
pictorial representation, 65 
relation to regulatory capital, 67 
scaling, 349 
shortfall-to-quantile ratio, 72, 154 
VaR, see value-at-risk yield of bond, 330 
VAR (vector autoregression), 544 factor models of yield curve, 334, 336 


Weibull distribution, 136 
white noise, 99, 540 
Williamson d-transform, 567 
wrong-way risk, 607, 638 


